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Abstract 

^vq Most existing work uses dual deconiposition and first-order methods to solve Net- 

work Utility Maximization (NUM) problems in a distributed manner, which suffer from 
r \ slow rate of convergence properties. This paper develops an alternative distributed 

/^ Newton-type fast converging algorithm for solving NUM problems with self-concordant 

utility functions. By using novel matrix splitting techniques, both primal and dual up- 
"^ dates for the Newton step can be computed using iterative schemes in a decentralized 

3 manner. We propose a stepsize rule and provide a distributed procedure to compute 

^ G^ it in finitely many iterations. The key feature of our direction and stepsize compu- 

tation schemes is that both are implemented using the same distributed information 
^^ exchange mechanism employed by first order methods. We show that even when the 

*K^ Newton direction and the stepsize in our method are computed within some error (due 

CO to finite truncation of the iterative schemes), the resulting objective function value 

^^ still converges superlinearly in terms of primal iterations to an explicitly characterized 

error neighborhood. Simulation results demonstrate significant convergence rate im- 

O 



provement of our algorithm relative to the existing first-order methods based on dual 



O decomposition. 
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1 Introduction 

Most of today's cominunication networks are large-scale and comprise of agents with hetero- 
geneous preferences. Lack of access to centralized information in such networks necessitate 
design of distributed control algorithms that can operate based on locally available infor- 
mation. Some applications include routing and congestion control in the Internet, data 
collection and processing in sensor networks, and cross-layer design in wireless networks. 
This work focuses on the rate control problem in wireline networks, which can be formulated 
in the Network Utility Maximization (NUM) framework proposed in [22] (see also [25], [33] , 
and [llj). NUM problems are characterized by a fixed network and a set of sources, which 
send information over the network along predetermined routes. Each source has a local util- 
ity function over the rate at which it sends information. The goal is to determine the source 
rates that maximize the sum of utilities subject to link capacity constraints. The standard 
approach for solving NUM problems relies on using dual decomposition and subgradient (or 
first-order) methods, which through a price exchange mechanism among the sources and 
the links yields algorithms that can operate on the basis of local informationj^ One major 
shortcoming of this approach is the slow rate of convergence. 

In this paper, we propose a novel Newton-type second-order method for solving the 
NUM problem in a distributed manner, which leads to significantly faster convergence. Our 
approach involves transforming the inequality constrained NUM problem to an equality- 
constrained one through introducing slack variables and logarithmic barrier functions, and 
using an equality-constrained Newton method for the reformulated problem. There are two 
challenges in implementing this method in a distributed manner. First challenge is the 
computation of the Newton direction. This computation involves a matrix inversion, which 
is costly and requires global information. We solve this problem by using an iterative scheme 
based on a novel matrix splitting technique. Since the objective function of the (equality- 
constrained) NUM problem is separable, i.e., it is the sum of functions over each of the 
variables, this splitting enables computation of the Newton direction using decentralized 
algorithms based on limited "scalar" information exchange between sources and links. This 
exchange involves destinations iteratively sending route prices (aggregated link prices or 
dual variables along a route) to the sources, and sources sending the route price scaled by 
the Hessian to the links along its route. Therefore, our algorithm has comparable level of 
information exchange with the first-order methods applied to the NUM problem. 

The second challenge is related to the computation of a stepsize rule that can guarantee 
local superlinear convergence of the primal iterations. Instead of the iterative backtracking 
rules typically used with Newton methods, we propose a stepsize rule which is inversely 
proportional to the inexact Newton decrement (where the inexactness arises due to errors 
in the computation of the Newton direction) if this decrement is above a certain threshold 
and takes the form of a pure Newton step otherwise. Computation of the inexact Newton 



"'^The price exchange mechanism involves destinations (end nodes of a route) sending route prices (ag- 
gregated over the hnks along the route) to sources, sources updating their rates based on these prices and 
finally links updating prices based on new rates sent over the network. 



decrement involves aggregating local information from the sources and links in the network. 
We propose a novel distributed procedure for computing the inexact Newton decrement in 
finite number of steps using again the same information exchange mechanism employed by 
first order methods. 

Since our method uses iterative schemes to compute the Newton direction, exact com- 
putation is not feasible. Another major contribution of our work is to consider a truncated 
version of this scheme, allow error in stepsize computation and present convergence rate 
analysis of the constrained Newton method when the stepsize and the Newton direction are 
estimated with some error. We show that when these errors are sufficiently small, the value 
of the objective function converges superlinearly in terms of primal iterations to a neighbor- 
hood of the optimal objective function value, whose size is explicitly quantified as a function 
of the errors and bounds on them. 

Our work contributes to the growing literature on distributed optimization and control 
of multi-agent networked systems. There are two standard approaches for designing dis- 
tributed algorithms for such problems. The first approach, as mentioned above, uses dual 
decomposition and subgradient methods, which for some problems including NUM problems 
lead to iterative distributed algorithms (see [22], [2S])- Subsequent work by Athuraliya and 
Low in pj use diagonal scaling to approximate Newton steps to speed up the subgradient 
algorithm while maintaining their distributed nature. Despite improvements in speed over 
the first-order methods, as we shall see, the performance of this modified algorithm does not 
achieve the rate gains obtained by second-order methods. 

The second approach involves considering consensus- based schemes, in which agents ex- 
change local estimates with their neighbors with the goal of aggregating information over 
an exogenous (fixed or time- varying) network topology (see [M], [H], [29], [35], [IT], [3T] . 
[I8] and [32]). It has been shown that under some mild assumption on the connectivity of 
the graph and updating rules, the distance from the vector formed by current estimates to 
consensus diminishes linearly. Consensus schemes can be used to compute the average of 
local values or more generally as a building block for developing distributed optimization 
algorithms with linear /sublinear rate of convergence ([28j). The stepsize for the distributed 
Newton method can be computed using consensus type of algorithms. However, the dis- 
tributed Newton method achieves quadratic rate of convergence for the primal iterations, 
using consensus results in prohibitively slow stepsize computation at each iteration, and is 
hence avoided in our method. 

Other than the papers cited above, our paper is also related to [1], [23], [6] and [18] . 
In [1], Bertsekas and Gafni studied a projected Newton method for optimization problems 
with twice differentiable objective functions and simplex constraints. They proposed find- 
ing the Newton direction (exactly or approximately) using a conjugate gradient method. 
This work showed that when applied to multi-commodity network fiow problems, the con- 
jugate gradient iterations can be obtained using simple graph operations, however did not 
investigate distributed implementations. Similarly, in p^, Klincewicz proposed a Newton 
method for network fiow problems that computes the dual variables at each step using an 
iterative conjugate gradient algorithm. He showed that conjugate gradient iterations can be 



iinplemented using a "distributed" scheme that involves simple operations and information 
exchange along a spanning tree. Spanning tree based computations involve passing all in- 
formation to a centralized node and may therefore be restrictive for NUM problems which 
are characterized by decentralized (potentially autonomous) sources. 

In ^ , the authors have developed a distributed Newton-type method for the NUM prob- 
lem using a belief propagation algorithm. Belief propagation algorithms, while performing 
well in practice, lack systematic convergence guarantees. Another recent paper [TS] studied 
a Newton method for equality-constrained network optimization problems and presented a 
convergence analysis under Lipschitz assumptions. In this paper, we focus on an inequality- 
constrained problem, which is reformulated as an equality-constrained problem using barrier 
functions. Therefore, this problem does not satisfy Lipschitz assumptions. Instead, we as- 
sume that the utility functions are self-concordant and present a novel convergence analysis 
using properties of self-concordant functions. 

Our analysis for the convergence of the algorithm also relates to work on convergence 
rate analysis of inexact Newton methods (see [H], ^Q\)- These works focus on providing 
conditions on the amount of error at each iteration relative to the norm of the gradient of the 
current iterate that ensures superlinear convergence to the exact optimal solution (essentially 
requiring the error to vanish in the limit). Even though these analyses can provide superlinear 
rate of convergence, the vanishing error requirement can be too restrictive for practical 
implementations. Another novel feature of our analysis is the consideration of convergence 
to an approximate neighborhood of the optimal solution. In particular, we allow a fixed error 
level to be maintained at each step of the Newton direction computation and show that 
superlinear convergence is achieved by the primal iterates to an error neighborhood, whose 
size can be controlled by tuning the parameters of the algorithm. Hence, our work also 
contributes to the literature on error analysis for inexact Newton methods. 

The rest of the paper is organized as follows: Section [2] defines the problem formulation 
and related transformations. Section [3] describes the exact constrained primal-dual Newton 
method for this problem. Section |4] presents a distributed iterative scheme for computing the 
dual Newton step and the distributed inexact Newton-type algorithm. Section [5] contains 
the rate of convergence analysis for our algorithm. Section |6] presents simulation results to 
demonstrate convergence speed improvement of our algorithm to the existing methods with 
linear convergence rates. Section [7] contains our concluding remarks. 
Basic Notation and Notions: 

A vector is viewed as a column vector, unless clearly stated otherwise. We write M+ to 
denote the set of nonnegative real numbers, i.e., M+ = [0, cxd). We use subscripts to denote 
the components of a vector and superscripts to index a sequence, i.e., Xi is the i*'* component 
of vector x and x'^ is the fcth element of a sequence. When Xi > for all components z of a 
vector X, we write x > 0. 

For a matrix A, we write Aij to denote the matrix entry in the i*^ row and j*'* column, 
and [A]i to denote the i*'^ column of the matrix A, and [A]^ to denote the j*'^ row of the 
matrix A. We write I{n) to denote the identity matrix of dimension nxn. We use x' and A' 
to denote the transpose of a vector x and a matrix A respectively. For a real-valued function 



/ : X — i- M, where X is a subset of M", the gradient vector and the Hessian matrix of / at 
X in X are denoted by V/(x) and V^/(a;) respectively. We use the vector e to denote the 
vector of all ones. 

A real-valued convex function (/ : X — t- M, where X is a subset of M, is self- concordant if 
it is three times continuously different iable and \g"'{x)\ < 2g"{x)2 for all x in its domainJj 
For real- valued functions in M", a convex function g : X ^ M., where X is a subset of 
M", is self-concordant if it is self-concordant along every direction in its domain, i.e., if the 
function g{t) = g{x + tv) is self-concordant in t for all x and v. Operations that preserve 
self-concordance property include summing, scaling by a factor a > 1, and composition with 
afiine transformation (see [9] Chapter 9 for more details). 

2 Network Utility Maximization Problem 

We consider a network represented by a set C = {1, ..., L} of (directed) links of finite nonzero 
capacity given by c = [ci]i^c with c > 0. The network is shared by a set S = {1, ..., S} of 
sources, each of which transmits information along a predetermined route. For each link /, 
let S{1) denote the set of sources use it. For each source i, let L{i) denote the set of links 
it uses. We also denote the nonnegative source rate vector by s = [sjjjg^. The capacity 
constraint at the links can be compactly expressed as 

Rs < c, 

where R is the routing matrixQoi dimension L x S, i.e., 

„ _ J 1 if link i is on the route of source j, , s 

^■^ \ otherwise. 

We associate a utility function Ui : IR+ — )■ M with each source i, i.e., Ui{si) denotes 
the utility of source i as a function of the source rate Sj. We assume the utility functions 
are additive, such that the overall utility of the network is given by J2i=i Ui{si). Thus the 
Network Utility Maximization(NUM) problem can be formulated as 

s 
maximize 

i=l 

subject to Rs < c, 
s > 0. 

We adopt the following assumption. 

^Self-concordant functions are defined through the foUowing more general definition: a real- valued three 
times continuously differentiable convex function 5 : X — > M, where X is a subset of M, is self-concordant, 
if there exists a constant a > 0, such that |5"'(a;)| < 2a~2g"(^x)^ for all x in its domain [3D], [inj. Here we 
focus on the case a = 1 for notational simplification in the analysis. 

■^This is also referred to as the link-source incidence matrix in the literature. Without loss of generality, 
we assume that each source fiow traverses at least one link, each link is used by at least one source and the 
links form a connected graph. 



Ef^^(^^) (2) 



Assumption 1. The utility functions Ui : IR+ — t- M are strictly concave, monotonically 
nondecreasing on (0, oo). The functions —Ui : IR+ — > M are self-concordant on (0, oo). 

The self-concordance assumption is satisfied by standard utility functions considered 
in the literature, for instance logarithmic, i.e., weighted proportional fair utility functions 
[55] . and concave quadratic utility functions, and is adopted here to allow a self-concordant 
analysis in establishing local quadratic convergence. We use h{x) to denote the (negative 
of the) objective function of problem (bj), i.e., h{x) = — X]i=i Ui{xi), and h* to denote the 
(negative of the) optimal value of this problemn Since h{x) is continuous and the feasible set 
of problem ^ is compact, it follows that problem ^ has an optimal solution, and therefore 
h* is finite. Moreover, the interior of the feasible set is nonempty, i.e., there exists a feasible 
solution X with Xi = ^-^ for alH G 5 with c > Or\ 

To facilitate the development of a distributea Newton-type method, we consider a re- 
lated equality-constrained problem by introducing nonnegative slack variables [yi]i£c for the 
capacity constraints, defined by 

s 
^RijSj +yi = ci for / = 1,2...L, (3) 

i=i 

and logarithmic barrier functions for the nonnegativity constraints (which can be done since 
the feasible set of (^ has a nonempty interior) Jj We denote the new decision vector by 
^ — il^ilies^ [yiYiecY- This problem can be written as 

S S+L 

minimize — >^ Ui{xi) — // 2, l^S (^j) (4) 

i=l 1=1 

subject to Ax = c, 

where A is the L x [S + L)-dimensional matrix given by 

A=[R /(L)], (5) 

and /i is a nonnegative barrier function coefficient. We use f{x) to denote the objective 
function of problem (|4]), i.e., 

S S+L 

f{x) = -^^Uiixi)- fi'^logixi), (6) 

1=1 i=l 

and /* to denote the optimal value of this problem, which is finite for positive /i|^ 

^We consider the negative of the objective function value to work with a minimization problem. 

^One possible value for c is c = min;{c;}. 

^We adopt the convention that log(a;) = — oo for x < 0. 

^This problem has a feasible solution, hence /* is upper bounded. Each of the variable Xi is upper 
bounded by c, where c — maxjJQ}, hence by monotonicity of utility and logarithm functions, the optimal 
objective function value is lower bounded. Note that in the optimal solution of problem (H) x^ 7^ for all i, 
due to the logarithmic barrier functions. 



By Assumption [I| the function f{x) is separable, strictly convex, and has a positive 
definite diagonal Hessian matrix on the positive orthant. The function f{x) is also self- 
concordant for fi > 1, since both summing and scaling by a factor /i > 1 preserve self- 
concordance property. 

We write the optimal solution of problem Q for a fixed barrier function coefficient ft 
as x{fi). One can show that as the barrier function coefficient fi approaches 0, the optimal 
solution of problem (|4]) approaches that of problem ([2]), when the constraint set in ^ has 
nonempty interior and is convex [3], [15]. Hence by continuity from Assumption [11 h{x{jj,)) 
approaches h*. Therefore, in the rest of this paper, unless clearly stated otherwise, we study 
iterative distributed methods for solving problem Q for a given /i. In order to preserve the 
self-concordance property of the function /, which will be used in our convergence analysis, 
we first develop a Newton- type algorithm for /i > 1. In Section 5.3[ we show that problem 



Q for any fi > can be tackled by solving two instances of problem Q with different 
coefficients /i > 1, leading to a solution x(/i) that satisfies fe, — < « for any positive 
scalar a. 

3 Exact Newton Method 

For each fixed /i, problem Q is feasible and has a convex objective function, affine con- 
straints, and a finite optimal value /*. Therefore, we can use a strong duality theorem to 
show that, for problem Q, there is no duality gap and there exists a dual optimal solution 
(see [5]). Moreover, since matrix A has full row rank, 

we can use a (feasible start) equality-constrained Newton method to solve problem (|4])(see 
[9] Chapter 10). 

3.1 Feasible Initialization 

We initialize the algorithm with some feasible and strictly positive vector x". For example, 
one such initial vector is given by 

^°=^ ioTt = l,2...S, (7) 



S+l 

s 






where q is the finite capacity for link /, c is the minimum (nonzero) link capacity, S is the 
total number of sources in the network, and R is routing matrix [cf. Eq. ([I|]. 

3.2 Iterative Update Rule 

Given an initial feasible vector x°, the algorithm generates the iterates by 

^k+i = ^k^ d''Ax\ (8) 



where d!' is a positive stepsize, Ax^ is the (primal) Newton direction given as the solution 
of the following system of linear equations! 



y'^fi.x^) A' \ / AxM _ _ / Vf{x^ 
A Q l\ w^ I ~ \ 



k \=- \ a ] ■ (9) 



We will refer to x^ as the primal vector and w^ as the dual vector (and their components 
as primal and dual variables respectively). We also refer to w'^ as the price vector since the 
dual variables [w7f];g£ associated with the link capacity constraints can be viewed as prices 
for using links. For notational convenience, we will use Hk = V'^f{x^) to denote the Hessian 
matrix in the rest of the paper. 

Solving for Ax'' and w'^ in the preceding system yields 

Ax'' = -H^\Vf{x'') + A'w''), (10) 

{AH-'A')w'' = -AH.'Vfix''). (11) 

This system has a unique solution for all k. To see this, note that the matrix Hk is a 
diagonal matrix with entries 



d^U,{x1) 



+ 7:^ 1 < ^ < ^, 



{ (^ S + l<i<S + L. 

the functions Ui are strictly concave, which implies — „^2 ' < 0. Moreover, 
the primal vector x'' is bounded (since the method maintains feasibility) and, as we shall 



see in Section 4.4, can be guaranteed to remain strictly positive by proper choice of stepsize. 
Therefore, the entries {Hk)ii > and are well-defined for all i, implying that the Hessian 
matrix Hk is invertible. Due to the structure of A [cf. Eq. (Is])], the column span of A is 
the entire space M'^, and hence the matrix AH^^A' is also invertiblen This shows that the 
preceding system of linear equations can be solved uniquely for all k. 

The objective function / is separable in Xj, therefore given the vector wf for / in L{i), the 
Newton direction Axf can be computed by each source i using local information available 
to that source. However, the computation of the vector w'' at a given primal solution x'' 
cannot be implemented in a decentralized manner since the evaluation of the matrix inverse 
{AHj^^A')~^ requires global information. The following section provides a distributed inexact 
Newton method, based on computing the vector w'' using a decentralized iterative scheme. 



®This is a primal-dual method with the vectors Ax'' and w'^ acting as primal direction and dual variables 

respectively 

1 

= 0, which implies 



^If for some a; G M^, we have AH), ^A'x = 0, then x'AH^ ^A'x = 



Hk ' A'x 



2 



||v4'a;||2 = 0, because the matrix H is invertible. The rows of the matrix A' span M , therefore we have 



a; = 0. This shows that the matrix AHj^ ^A' is invertible. 



4 Distributed Inexact Newton Method 

In this section, we introduce a distributed Newton method using ideas from matrix sphtting 
in order to compute the dual vector w^ at each k using an iterative scheme. Before proceeding 
to present the details of the algorithm, we first introduce some preliminaries on matrix 
splitting. 

4.1 Preliminaries on Matrix Splitting 

Matrix splitting can be used to solve a system of linear equations given by 

Gy = a, 

where G is an n x n matrix and a is an n-dimensional vector. Suppose that the matrix G 
can be expressed as the sum of an invertible matrix M and a matrix N, i.e., 

G = M + N. (13) 

Let j/o be an arbitrary n-dimensional vector. A sequence {y^} can be generated by the 
following iteration: 



.k+l _ i\/r-lAT„.k I (i/T-l 



y 



M-^Ny^ + M-^a. (14) 

It can be seen that the sequence {y^} converges as k -^ oo ii and only if the spectral radius 
of the matrix M~^N is strictly bounded above by 1. When the sequence {y^} converges, its 
limit y* solves the original linear system, i.e., Gy* = a (see [2] and [13] for more details). 
Hence, the key to solving the linear equation via matrix splitting is the bound on the spectral 
radius of the matrix M~^N. Such a bound can be obtained using the following result (see 
Theorem 2.5.3 from [13J). 

Theorem 4.1. Let G be a real symmetric matrix. Let M and A^ be matrices such that 
G = M + N and assume that M is invertible and both matrices M + N and M — N 
are positive definite. Then the spectral radius of M~'^N, denoted by p{M~^N), satisfies 
p{M-^N) < 1. 

By the above theorem, if G is a real, symmetric, positive definite matrix and M is a 



nonsingular matrix, then one sufficient condition for the iteration (14) to converge is that 
the matrix M — A^ is positive definite. This can be guaranteed using Gershgorin Circle 
Theorem, which we introduce next (see [2Z] for more details). 

Theorem 4.2. (Gershgorin Circle Theorem) Let G be an n x n matrix, and define ri{G) = 
Ylijj.i \^i3 1 • Then, each eigenvalue of G lies in one of the Gershgorin sets {Lj}, with Fj defined 
as disks in the complex plane, i.e., 

Vi = {zeC\ \z-Gu\<n{G)}. 



One corollary of the above theorem is that if a matrix G is strictly diagonally dominant, 
i.e., \Gii\ > J2j=/=i \^ij\y ^^d Gii > for all i, then the real parts of all the eigenvalues lie in 
the positive half of the real line, and thus the matrix is positive definite. Hence a sufficient 
condition for the matrix M — A^ to be positive definite is that M — iV is strictly diagonally 
dominant with strictly positive diagonal entries. 

4.2 Distributed Computation of the Dual Vector 

We use the matrix splitting scheme introduced in the preceding section to compute the dual 



vector w'' in Eq. (11) in a distributed manner. Let D^ be a diagonal matrix, with diagonal 
entries 

iDk)u = {AH,'A')a, (15) 

and matrix Bk be given by 

Bk = AH~^A' - Dk. (16) 

Let matrix Bk be a diagonal matrix, with diagonal entries 

L 

{Buh = Y.^Bk).r (17) 

i=i 

By splitting the matrix AHj^^A' as the sum of Dk + Bk and Bk — Bk, we obtain the following 
result. 



Theorem 4.3. For a given k > 0, let Dk, Bk, Bk be the matrices defined in Eqs. (15), (16) 



and (17). Let w{0) be an arbitrary initial vector and consider the sequence {'w{t)} generated 



by the iteration 

w{t + 1) = {Dk + Bk)-\Bk - Bk)w{t) + {Dk + Bk)-\-AH^'Vf{x')), (18) 

for all t > 0. Then the spectral radius of the matrix {Dk + Bk)^^{Bk — Bk) is strictly bounded 
above by 1 and the sequence {'w(t)} converges as t — )■ oo, and its limit is the solution to Eq. 



(11). 



Proof. We split the matrix AHj^ ^A' as 



-1 A' 



{AH^'A) = {Dk + Bk) + {Bk - Bk) (19) 



and use the iterative scheme presented in Eqs. (13) and (14) to solve Eq. (11). For all k, 
both the real matrix Hk and its inverse, H^^ , are positive definite and diagonal. The matrix 
A has full row rank and is element-wise nonnegative. Therefore the product AH^^A' is real, 
symmetric, element-wise nonnegative and positive definite. We let 

Qk = {Dk + Bk) - {Bk - Bk) =Dk + 2Bk - Bk (20) 



denote the difference matrix. By definition of B^. [cf. Eq. (17)], the matrix 25^ — B^ is 
diagonally dominant, with nonnegative diagonal entries. Moreover, due to strict positivity 
of the second derivatives of the logarithmic barrier functions, we have {Dkju > for all i. 
Therefore the matrix Qk is strictly diagonally dominant. By Theorem |4.2[ such matrices 
are positive definite. Therefore, by Theorem 4.1, the spectral radius of the matrix [Dk + 
Bk)^^{Bk — Bj?) is strictly bounded above by 1. Hence the splitting scheme (19) guarantees 
the sequence {w{t)} generated by iteration (18) to converge to the solution of Eq. (11). D 



This provides an iterative scheme to compute the dual vector w^ at each primal iteration 



k using an iterative scheme. We will refer to the iterative scheme defined in Eq. (18) as the 
dual iteration. 

There are many ways to split the matrix AH^^A. The particular one in Eq. (19) is 



chosen here due to three desirable features. First it guarantees that the difference matrix Qk 



[cf. Eq. (20)] is strictly diagonally dominant, and hence ensures convergence of the sequence 
{w{t)}. Second, with this splitting scheme, the matrix D^ + Bk is diagonal, which eliminates 
the need for global information when calculating its inverse. The third feature enables us 



to study convergence rate of iteration (18) in terms of a dual (routing) graph which we 
introduce next. 

Definition 1. Consider a network Q = {C,S}, represented by a set £ = {1,...,L} of 
(directed) links, and a set S = {1,...,S} of sources. The links form a strongly connected 
graph, and each source sends information along a predetermined route. The weighted dual 
(routing) graph Q = {J\f, £}, where J\f is the set of nodes, and £ is the set of (directed) links 
defined by: 
k.M = C- 

B. A link is present between node L, to Lj in Q if and only if there is some common flow 
between Lj and Lj in Q. 

C. The weight Wij on the link from node Lj to Lj is given by 



Wj., 



{Dk + Bk)t{Bk\, = {Dk + B, 



v-l 



{AH^^A\, = {Dk + B, 



\-i 



s&s{i)ns(j) 



where the matrices D^, B^, and B^ are defined in Eqs. (15), (16) and (17). 



One example of a network and its dual graph are presented in Figures [T] and [2j Note that 
the unweighted indegree and outdegree of a node are the same in the dual graph, however 
the weights are different depending on the direction of the links. The splitting scheme in 
Eq. (19) involves the matrix {Dk + Bk)~^{Bk — Bk), which is the weighted Laplacian matrix 
of the dual graph j^ The weighted out-degree of node i in the dual graph, i.e., the diagonal 



^°We adopt the following definition for the weighted Laplacian matrix of a graph. Consider a weighted 
directed graph Q with weight Wij associated with the link from node i to j. We let Wij = whenever the 
link is not present. These weights form a weighted adjacency matrix W. The weighted out-degree matrix 
D is defined as a diagonal matrix with Da = ^ Wij and the weighted Laplacian matrix L is defined as 
L = D — W. See [7], [12; for more details on graph Laplacian matrices. 
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Figure 1: A sample network. Each source- destination pair is displayed with the same color. 
We use Xi to denote the flow corresponding to the i*^ source-destination pair and Li to denote 
the i^^ link. 

Xi 




Figure 2: Dual graph for the network in Figure [l| each link in this graph corresponds to the 
flows shared between the links in the original network. 



entry [D^ + -B^)^^ Ba of the Laplacian matrix, can be viewed as a measure of the congestion 
level of a link in the original network since the neighbors in the dual graph represent links 



that share flows in the original network. We show in Section pJ] that the spectral properties 
of the Laplacian matrix of the dual graph dictate the convergence speed of dual iteration 



(18) 



We next rewrite iteration (18), analyze the information exchange required to implement it 



and develop a distributed computation procedure to calculate the dual vector. For notational 
convenience, we define the price of the route for source i, 7rj(t), as the sum of the dual variables 
associated with links used by source i at the t*'* dual iteration, i.e., 7rj(t) = X]/eL(i) ^'('^)' 
Similarly, we define the weighted price of the route for source i, nj(t), as the price of the 
route for source i weighted by the ith diagonal element of the inverse Hessian matrix, i.e., 

Mt) = {H^%,Y.i^^^i)Mt)- 
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Lemma 4.4. For each primal iteration k, the dual iteration ( |18[ ) can be written as 

^Kt+i)=77^-^ '-^ — =— (( E iim-j:(H.)u')Mt)-Y. Mt) 

(21) 

ies{i) ies{i) 

where nj(0) is the weighted price of the route for source i when w{0) = [1,1..., 1]'. 

Proof. Recall the definition of matrix A, i.e.. An = 1 for i = 1, 2 ... 5* if source i uses link /, 
i.e., i G S{1), and An = otherwise. Therefore, we can write the price of the route for source 
i as, TTi{t) = '^i^iAiiw{t)i = [A'yw{t). Similarly, since the Hessian matrix H^ is diagonal, 
the weighted price can be written as 

n.(t) = {H,)T,'[Arw{t) = [Hj^'Afwit). (22) 

On the other hand, since A = [R I{L)], where R is the routing matrix, we have 

s 

i=l 
S 



Y,M[Hj;'A'Mt)) + {H^')^s+i)is+iMt). 



i=l 



Using the definition of the matrix A one more time, this implies 

{AH-'A'wm = E l^k'^rwit) + iH,%^,^^s+i)Wiit) (23) 

iesn) 

= E n.(^) + iH-')(s+i)is+i)Wiit), 



where the last equality follows from Eq. (22). 



Using Eq. (16), the above relation implies that {{Bk + Dk)w{t))i = Xliesf/) ■'^«(^) + 
{Hj^^)(^s+i)is+i)Wi{t). We next rewrite {Bk)ii. Using the fact that w{0) = [1, 1 . . . , 1]', we 
have 

L 

{AH^'A'w{0))i = {{Bk + Dk)w{0))i = ^(5^),, + {Dk)u. 



Using the definition of Bk [cf. Eq. (17)], this implies 



{Bk)u = J2{Bk)ij = {AH^'A'w{0))i - {Dk)u 
= Yl n.(0) + iH,;')^s+D(s+i) - {Dk)u. 
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This calculation can further be simplified using 



(H, -^ 



k){S+l){S+l)^ 



[cf. Eq. (15)], yielding 

mu = Y. n.(o) - Y. ^'^^^ 

Following the same argument, the value {Bkw{t))i for all t can be written as 

{Bkw{t))i = {AH-'A'w{t))i - {Dkw{t))i 
s 
= 5^n,(t) + {H,')^s+i)(s+i)Wiit) - {Dk)uwi{t) 



(24) 



(25) 
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5^n,(t)- 5^ (//.),>z(t), 

ies{i) 



i=l 



the last equality follows from Eq. (24) 



where the first equality follows from Eq. (17), the second equality follows from Eq. ( 23 ) , and 



Finally, we can write {AH^. ^Vf{x^))i as 

{AH-'Vf{x'))i = J2 {H^%^J{x^) + {H-')^s+i)is^i)Vs+if{x^). 



Substituting the preceding into (18), we obtain the desired iteration (21). 



D 



We next analyze the information exchange required to implement iteration (21) among 



sources and links in the network. We first observe the local information available to sources 
and links. Each source i knows the ith diagonal entry of the Hessian {Hkja and the ith 
component of the gradient Vj/(x'^). Similarly, each link / knows the (5* + /)th diagonal 
entry of the Hessian {Hk)s+i,s+i and the {S + /)th component of the gradient Vs+if{x^)- 
In addition to the locally available information, each link /, when executing iteration ( [2T| , 
needs to compute the terms: 

i£S(i) i£S{i) i£S{i) ieS{i) 

The first two terms can be computed by link / if each source sends its local information to the 
links along its route "once" in primal iteration k. The third term can be computed by link 
/ again once for every k if the route price 7rj(0) (aggregated along the links of a route when 
link prices are all equal to 1) are sent by the destination to source i, which then evaluates 
and sends the weighted price Hj(0) to the links along its route. The fourth term can be 
computed with a similar feedback mechanism, however the computation of this term needs 
to be repeated for every dual iteration t. 
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Figure 3: Direction of information flow for the 
steps l.a, l.c and 2.b, from sources to the hnks 
they use. 




Figure 4: Direction of flow for the steps l.b 
and 2. a, from links to the sources using them. 



The preceding information exchange suggests the following distributed implementation 



of (21) (at each primal iteration k) among the sources and the links, where each source or 



link is viewed as a processor, information available at source i can be passed to the links it 
traverses, i.e., / G L{i), and information about the links along a route can be aggregated and 
sent back to the corresponding source using a feedback mechanism: 

1. Initialization. 



l.a Each source i sends its local information {Hi^ja and Vj/(x'^) to the links along its 
route, / G L{i). Each link / computes (^fe)(5%(5+,), E^e5(/)(^fc)^^^ {H^^){s+i){s+i) 
and Y.^esi^l){H^%^^f{x^). 

l.b Each link / starts with price Wi(0) = 1. The link prices ^^(0) are aggregated along 
route i to compute 7r(0) = XlzeLfi) ^K^) at the destination. This information is 
sent back to source i. 

l.c Each source computes the weighted price nj(0) = {H^^)ii J2i(^L(i) "^li^) and sends 
it to the links along its route, / G L{i). 

l.d Each link / then initializes with arbitrary price wi{l). 
2. Dual Iteration. 



2. a The link prices wi{t) are updated using (21) and aggregated along route i to 



compute 7r(t) at the destination. This information is sent back to source i. 

2.b Each source computes the weighted price Ili{t) and sends it to the links along its 
route, / G L{i). 

The direction of information flow can be seen in Figures |3] and |4j 

Note that the sources need to send their Hessian and gradient information once per primal 
iteration since these values do not change in the dual iterations. Moreover, this algorithm has 
comparable level of information exchange with the subgradient based algorithms applied to 
the NUM problem ^ (see [T], [21], [25J, [27J for more details). In both types of algorithms, 
only the sum of prices of links along a route is fed back to the source, and the links update 



14 



prices based on scalar information sent from sources using that link. The computation here 
is slightly more involved since it requires scaling by Hessian matrix entries, however all 
operations are scalar-based, hence does not impose degradation on the performance of the 
algorithm. 

4.3 Distributed Computation of the Primal Newton Direction 

Once the dual variables are computed, the primal Newton direction can be obtained accord- 
ing to Eq. ( |Io| ) as 

{Ax% = -{H,)-^{VdXx'') + iA'w%) = -{H,)-.^Vd{x') + n„ (26) 

where IIj is the weighted price of the route for source i computed at termination of the dual 
iteration. Hence, the primal Newton direction can be computed using local information by 
each source. However, because the dual variable computation involves an iterative scheme, 
the exact value for w^ is not available. Therefore, the direction Ax^ computed using Eq. 
(26) may violate the equality constraints in problems ^. To maintain feasibility of the 



generated primal vectors, the calculation of the inexact Newton direction at a primal vector 
x^ , which we denote by Ax'^, is separated into two stages. 

In the first stage, the first 5* components of Ax^, denoted by As'^, is computed via Eq. 



(26) using the dual variables obtained via the iterative scheme, i.e., 

AS.' = -{Hu)i^{^^f{x') + [^T^')- (27) 

In the second stage, the last L components of Ax^ (corresponding to the slack variables) are 
computed to ensure that the condition A/S.x^ = is satisfied, i.e. 

Ax' = ( _^£^, ) . (28) 

This calculation involves each link computing the slack introduced by the first S components 
of Ax^. 

The algorithm presented generates the primal vectors as follows: Let x^ be an initial 
strictly positive feasible primal vector (see Eq. ([T]) for one possible choice). For any fc > 0, 
we have 

^fc+i = a-fc + c^'=Ax^ (29) 

where d^ is a positive stepsize and Ax*"' is the inexact Newton direction at primal vector 
x^ (obtained through an iterative dual variable computation scheme and a two-stage primal 
direction computation that maintains feasibility). We will refer to this algorithm as the 
(distributed) inexact Newton method. 
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4.4 Stepsize Rule 

We next describe a stepsize rule that can be computed in a distributed manner while achiev- 
ing local superlinear convergence rate (to an error neighborhood) for the primal iterations. 
This rule will further guarantee that the primal vectors x^ generated by the algorithm re- 
main strictly positive for all /c, hence ensuring that the Hessian matrix is well-defined at all 



iterates (see Eq. (12) and Theorem 4.6) 



Our stepsize rule will be based on an inexact version of the Newton decrement. At a given 
primal vector x^ (with Hessian matrix Hk)-, we define the exact Newton direction, denoted 
by Ax'^, as the exact solution of the system of equations (^. The exact Newton decrement 
\{x^) is defined as 

\{x^) = ^/{Ax''yHkAxK (30) 

Similarly, the inexact Newton decrement A(a;'^) is given by 

~X{x'') = ^/[M^yihM^, (31) 

where Ax^ is the inexact Newton direction at primal vector x^. Note that both X{x^) and 
X{x^) are nonnegative and well-defined due to the fact that the matrix V^/(x'') is positive 
definite. 

Our stepsize rule involves the inexact Newton decrement X{x''), we use 6^ to denote the 
approximate value of A(x'^) obtained through some distributed computation procedure. One 
possible such procedure with finite termination yielding 6^ = A(x'^) exactly is described in 
Appendix |A| However, other estimates 9'^ can be used, which can potentially be obtained 
by exploiting the diagonal structure of the Hessian matrix, writing the inexact Newton 
decrement as 

A(x^) = / J2 i^S:'')KHkh = ViL + S)y, 
]] iecijs 

where y = ^^ J2ies\j c(^^^)'i(^k)ii and using consensus type of algorithms. 

Given the scalar 6'', an approximation to the inexact Newton decrement A(x'^), at each 
iteration k, we choose the stepsize d^ as follows: Let V be some positive scalar with < 
V < 0.267. We have 



jk / 0fcXY if 9'^ >V for all previous k, 
\ 1 otherwise. 



(32) 



where b G (0, 1). The upper bound on V will be used in analysis of the quadratic convergence 
phase of our algorithm [cf. Assumption 111 . This bound will also ensure the strict positivity 



of the generated primal vectors [cf. Theorem 4.6 



There can be two sources of error in the execution of the algorithm. The first is in the 
computation of the inexact Newton direction, which arises due to iterative computation of 
the dual vector w'' and the modification we use to maintain feasibility. Second source of error 
is in the stepsize rule, which is a function of 9^, an approximation to the inexact Newton 
decrement A(x'^). We next state two assumptions that quantify the bounds on these errors. 
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Assumption 2. Let {x*^} denote the sequence of primal vectors generated by tlie distributed 
inexact Newton method. Let Aa;'^ and Ax^ denote the exact and inexact Newton directions 
at x^, and 7^^ denote the error in the Newton direction computation, i.e., 

Ax'' = Ax'' + 7^ (33) 

For all k, 7^ satisfies 

|(7'=)'VV(a;^)7l < p\Ax''yV^f{x'')Ax'' + e. (34) 

for some positive scalars p < 1 and e. 

This assumption imposes a bound on the weighted norm of the Newton direction error 7^^ 
as a function of the weighted norm of Ax'' and a constant e. Note that without the constant 
e, we would require this error to vanish when x'' is close to the optimal solution, i.e., when 
Ax'' is small, which is impractical for implementation purposes. Given p and e, one can 
devise distributed schemes for determining the number of dual iterations needed so that the 
resulting error 7^^ satisfies this Assumption (see Appendix |b|). 

We bound the error in the inexact Newton decrement calculation as follows. 

Assumption 3. Let t'' denote the error in the Newton decrement calculation, i.e., 

r'' = ~X{x'') - e''. (35) 

For all fc, t'' satisfies 

|r1<Q-l)(l + ^). 

This assumption will be used in establishing the strict positivity of the generated primal 
vectors x''. Given h and V , using convergence rate results for average consensus schemes (see 
[32j,|28j), one can provide a lower bound on the number of average consensus steps needed 
so that the error t'' satisfies this assumption. When the method presented in Appendix |A] is 
used to compute 0'' , then we have r'^ = for all k and the preceding assumption is satisfied 
clearly. Throughout the rest of the paper, we assume the conditions in Assumptions [l]J3] 
hold. 



We next show that the stepsize choice in (32) will guarantee strict positivity of the primal 



vector X generated by our algorithm. This is important since it ensures that the Hessian H 
and therefore the (inexact) Newton direction is well-defined at each iteration. We proceed 
by first establishing a bound on the error in the stepsize calculation. 

Lemma 4.5. Let 9'' be an approximation of the inexact Newton decrement \[x'') defined 
in (31). For 9'' > V, we have 



(26 - l)/(A(x^) + 1) < — - < l/CXix") + 1), (36) 



where b G (0, 1) is the constant used in stepsize choice (32) 
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1 Wi + e' 



Proof. By Assumption [s] and the fact ^'^ > V^, we have 

|A(x^)-^^l<Q-l)(l + \/)<Q 
By multiplying both sides by the positive scalar b, the above relation implies 

be'' - b~x{x'') < {i-b){i + d''), 

which yields 

(26 - 1)^'= + (26 - 1) <6A(a:'=) + 6. 
By dividing both sides of the above relation by the positive scalar 



(37) 



^• + l)(A(x^) + 1), we 



obtain the first inequality in Eq. (36). 



Similarly, using Eq. (37) we can establish 

b~x{x'')-be'' < {i-b){i + e''), 

which can be rewritten as 

b~X{x'') + b<e'' + l. 
After dividing both sides of the preceding relation by the positive scalar {9'' + l)(A(x'^) + 1) 



we obtain the second inequality in Eq. (36) 
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With this bound on the stepsize error, we can show that starting with a strictly positive 
feasible solution, the primal vectors x'^ generated by our algorithm remain positive for all k. 

Theorem 4.6. Given a strictly positive feasible primal vector x^, let {x'^} be the sequence 
generated by the inexact distributed Newton method (29). Assume that the stepsize d'' is 
selected according to Eq. (32) and the constant 6 satisfies ^yzi < 6 < 1. Then, the primal 



vector x'^ is strictly positive for all k. 

Proof. We will prove this claim by induction. The base case of x° > holds by the as- 
sumption of the theorem. Since the Ui are strictly concave [cf. Assumption I], for any 



we have 






xf) > 0. Given the form of the Hessian matrix [cf. Eq. (12)], this implies 



[Hk)ii > T^yi for all i, and therefore 

/S+L 



X 




> max,- 



^A 






where the last inequality follows from the nonnegativity of the terms /x 
the reciprocal on both sides, the above relation implies 






X{x^) 



< 



1 



max,- 



y^JlAx^ 



-mm. 



/^ 



Ax? 



< min,- 



X" 



Ax 






By taking 



(38) 



where the last inequality follows from the fact that // > 1. 
We show the inductive step by considering two cases. 



1^ 



Case i: e''>V 



By Lemma 45, the stepsize d'' satisfies 

d"" <1/{1 + X{x''))< l/X{x''). 

Using Eq. (38), this imphes d'^ < min. 



0. 



Ax* 



. Hence if x^ > 0, then x'^"'"^ = x'^+d'^Ax'' > 



2y+i' 



Case ii: 6'' < V 

By Assumption 3, we have X{x'') < l^+ (^ - l) (1 + V). Using the fact that b > ^^^^±^ 

we obtain 

Xix') <V+(^-l\il + V)<V+ (^^ - l") (1 + y) = 2F < 1, 

where the last inequahty follows from the fact that V < 0.267. Hence we have d 
1 < T7^ < min,: 



A(xfe) 



Ax? I ' 



where the last inequality follows from Eq. (38). Once again, if 



X 



> 0, then x''^^ = x'' + d'^Ax^ > 0. 



In both cases we have x^^^ = x^ + d^Ax'^ > 0, which completes the induction proof. D 

In the rest of the paper, we will assume that the constant b used in the definition of the 



stepsize satisfies 



v+i 

2V+\ 



<b<\. 



5 Convergence Analysis 

We next present our convergence analysis for both primal and dual iterations. We first 
establish convergence for dual iterations. 

5.1 Convergence in Dual Iterations 



We characterize the rate of convergence of the dual iteration (18). We will use the following 
lemma 136). 



Lemma 5.1. Let M be an n x n matrix, and assume that its spectral radius, denoted 
by p{M), satisfies p{M) < 1. Let {Ai}j=i,..,^„ denote the set of eigenvalues of M, with 
1 > I All > IA2I > ... > |A„| and let Vi denote the set of corresponding unit length right 
eigenvectors. Assume the matrix has n linearly independent eigenvectors p] Then for the 
sequence w{t) generated by the following iteration 



w{t + l) = Mw{t), 



(39) 



^^An alternative assumption is that the algebraic multiplicity of each A^ is equal to its corresponding 
geometric multiplicity, since eigenvectors associated with different eigenvalues are independent pi] . 
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we have 

\\w{t)-w*\\^<\Xi\*a, (40) 



for some positive scalar a, where w* is the hmit of iteration (39) as t — )■ oo. 



We use M to denote the L x L matrix, M = {Dk + -B^) ^{Bk — B^), and z to denote 



the vector z = {Dk + Bk) ^{—AH^ ^V f{x^)). We can rewrite iteration (18) as w{t + 1) 
Mw{t) + z, which imphes 

q-l 

w{t + q) = M'^w{t) + Y^ M'z = W^wit) + (/ - M^)(/ - M)-^z. 

i=0 



This alternative representation is possible since p{M) < 1, which follows from Theorem 4.3 
After rearranging the terms, we obtain 

w{t + q)= M\w{t) -{I - My^z) + (/ - M)-^z. 

Therefore starting from some arbitrary initial vector w{0), the convergence speed of the 
sequence w(t) coincides with the sequence u{t), generated by u{t + q) = M''u{0), where 
u{0) = w{0) - M{I - M)-^z. 

We next show that the matrix M has L linearly independent eigenvectors in order to 
apply the preceding lemma. We first note that since the nonnegative matrix A has full row 
rank and the Hessian matrix H has positive diagonal elements, the product matrix AH^^A' 
has positive diagonal elements and nonnegative entries. This shows that the matrix Dk [cf. 



Eq. (15)] has positive diagonal elements and the matrix B [cf. Eq. (17)] has nonnegative 
entries. Therefore the matrix [Dk + Bk)""^ is diagonal and nonsingular. Hence, using the 
relation M = {Dk + Bk)^M{Dk + Bk)'^ , we see that the matrix M = {Dk + BkY^iBk- Bk) 
is similar to the matrix M = {Dk + Bk)^'^{Bk — Bk){Dk + Bk)~2. From the definition of 
Bk [cf. Eq. (16)] and the symmetry of the matrix AHj^^A', we conclude that the matrix B 



is symmetric. This shows that the matrix M is symmetric and hence diagonalizable, which 
implies that the matrix M is also diagonalizable, and therefore it has L linearly independent 



eigenvectors^^ We can use Lemma 5.1 to infer that 



\\w{t) — ti'*||2 = \\u(t) — u*\\2 < |Ai|*a, 

where Ai is the eigenvalue of M with largest magnitude, and a is a constant that depends on 
the initial vector u{0) = w{0) — (/ — M)^^z. Hence Ai determines the speed of convergence 
of the dual iteration. 

We next analyze the relationship between Ai and the dual graph topology. First note that 
the matrix M = {Dk + Bk)~^{Bk — Bk) is the weighted Laplacian matrix of the dual graph [cf. 



^^If a square matrix A of size n x n is symmetric, then A has n hnearly independent eigenvectors. If a 
square matrix B of size n x n is similar to a symmetric matrix, then B has n hnearly independent eigenvectors 

m. 
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Section 4.2 , and is therefore positive semidefinite [T2]. We then have p{M) = |Ai| = Ai > 0. 



From graph theory [26], Theorem 4.3 and the above analysis, we have 
4mc(M) 



< Ai < min |2m^ [{Dk + Bk)-^Bk]^^ , l| , 



(41) 




L 

where mc(M) is the weighted maximum cut of the dual graph, i.e., 

mc(M) = max 

SCAT 

where Wij is the weight associated with the link from node itoj. The above relation suggests 
that a large maximal cut of the dual graph provides a large lower bound on Ai, implying the 
dual iteration cannot finish with very few iterates. When the maximum weighted out-degree, 
i.e., max/gi, [{Dk + Bk)~^Bk\^i, in the dual graph is small, the above relation provides a small 
upper bound on Ai and hence suggesting that the dual iteration converges fast. 

We finally illustrate the relationship between the dual graph topology and the underlying 
network properties by means of two simple examples that highlight how different network 
structures can affect the dual graph and hence the convergence rate of the dual iteration. 
In particular, we show that the dual iteration converges slower for a network with a more 
congested link. Consider two networks given in Figures |5] and [7} whose corresponding dual 
graphs are presented in Figures [6] and [8] respectively. Both of these networks have 3 source- 
destination pairs and 7 links. However, in Figure |5] all three flows use the same link, i.e., L4, 
whereas in Figure [7] at most two flows share the same link. This difference in the network 
topology results in different degree distributions in the dual graphs as shown in Figures |6] 
and [8J To be more concrete, let Ui{si) = 151og(sj) for all sources i in both graphs and 
link capacity q = 35 for all links /. We apply our distributed Newton algorithm to both 
problems, for the primal iteration when all the source rates are 10, the largest weighted 
out-degree in the dual graphs of the two examples are 0.46 for Figure |6] and 0.095 for Figure 
[8} which implies the upper bounds for Ai of the corresponding dual iterations are 0.92 and 
0.19 respectively [cf. Eq. ([4l])]. The weighted maximum cut for Figure M is obtained by 
isolating the node corresponding to L4, with weighted maximum cut value of 0.52. The 
maximum cut for Figure IS] is formed by isolating the set {L4, Lg}, with weighted maximum 
cut value of 0.17. Based on ( [4l| ) these graph cuts generate lower bounds for Ai of 0.30 and 
0.096 respectively. By combining the upper and lower bounds, we obtain intervals for Ai as 
[0.30,0.92] and [0.096,0.19] respectively. Recall that a large spectral radius corresponds to 



slow convergence in the dual iteration [cf. Eq. (40)], therefore these bounds guarantee that 
the dual iteration for the network in Figure [7| which is less congested, converges faster than 
for the one in Figure [5j Numerical results suggest the actual largest eigenvalues are 0.47 and 
0.12 respectively, which confirm with the prediction. 



21 




Figure 5: Each source-destination pair is displayed with the same color. We use Xi to denote 
the flow corresponding to the i*'* source-destination pair and Lj to denote the i^'^ link. All 3 
flows traverse link L4. 




Figure 6: Dual graph for the network in Figure [5| each link in this graph corresponds to the 
flows shared between the links in the original network. The node corresponding to link L4 
has high unweighted out-degree equal to 6. 
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Figure 7: Each source-destination pair is displayed with the same color. We use Xi to denote 
the flow corresponding to the i*"^ source-destination pair and Li to denote the i*^ link. Each 
link has at most 2 flows traversing it. 




Figure 8: Dual graph for the network in Figure [8| each link in this graph corresponds to the 
flows shared between the links in the original network. Both nodes corresponding to links 
L4 and Lq has relatively high out-degree equal to 4. 
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5.2 Convergence in Primal Iterations 

We next present our convergence analysis for the primal sequence {x^} generated by the 
inexact Newton method (29). For the k^^ iteration, we define the function /^ : M — )■ M as 



fkit) = fix' + tAx'), (42) 

which is self-concordant, because the objective function / is self-concordant. Note that the 
value /fc(0) and fk{d'') are the objective function values at x^ and x^^^ respectively. Therefore 
fk{,d^) — fk{0) measures the decrease in the objective function value at the k^^ iteration. We 
will refer to the function fk as the objective function along the Newton direction. 

Before proceeding further, we first introduce some properties of self-concordant functions 
and the Newton decrement, which will be used in our convergence analysis f^ 

5.2.1 Preliminaries 

Using the definition of a self-concordant function, we have the following result (see |9] for 
the proof). 

Lemma 5.2. Let / : M — )■ M be a self-concordant function. Then for alH > in the domain 
of the function / with tf"{0)^ < 1, the following inequality holds: 

m < /(O) + t/(0) - tf (0)i - log(l - tf (0)i). (43) 

We will use the preceding lemma to prove a key relation in analyzing convergence prop- 



erties of our algorithm [see Lemma 5.8| . The next lemma will be used to relate the weighted 



norms of a vector z, with weights V^/(x) and V^f{y) for some x and y. This lemma plays 
an essential role in establishing properties for the Newton decrement (see [19] , [30] for more 
details) . 

Lemma 5.3. Let / : M" — > M be a self-concordant function. Suppose vectors x and y are in 
the domain of / and A = {{x — yyW"^ f {x){x — y))^ < 1, then for any z G M", the following 
inequality holds: 

(1 - ~\fz'V'f{x)z < z'V^f{y)z < -^^^z'V^f{x)z. (44) 

The next two lemmas establish properties of the Newton decrement generated by the 
equality-constrained Newton method. The first lemma extends results in [19] and [30] to 
allow inexactness in the Newton direction and refiects the effect of the error in the current 
step on the Newton decrement in the next stepf^ 



^■^We use the same notation in these lemmas as in (|4|-([6| since these relations will be used in the conver- 
gence analysis of the inexact Newton method applied to problem (|4|) . 

^^We use the same notation in the subsequent lemmas as in problem formulation (|4| despite the fact 
that the results hold for general optimization problems with self-concordant objective functions and linear 
equality constraints. 
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Lemma 5.4. Let / : M" — )■ M be a self-concordant function. Consider solving the equality 
constrained optimization problem 

minimize f{x) (45) 

subject to Ax = c, 

using an (exact) Newton method with feasible initialization, where the matrix A is in 
j^Lx{L+s) g^j^^ Y]_a,s full column rank, i.e., rank(74) = L. Let Ax be the exact Newton di- 
rection at X, i.e.. Ax solves the following system of linear equations. 

Let Ax denote any direction with 7 = Ax — Ax, and x(t) = x + tAx for t G [0, 1]. Let z 
be the exact Newton direction at x -|- Ax. If A = ■\/Ax'V^/(x)Ax < 1, then we have 

A^ 



zV^f{x + Ax)'z < ^^/z'V^f{x)z+ \iV^f{x)'z\. 

1 — A 



Proof. We first transform problem (45) into an unconstrained one via elimination technique, 
establish equivalence in the Newton decrements and the Newton primal directions between 
the two problems following the lines in jH], then derive the results for the unconstrained 
problem and lastly we map the result back to the original constrained problem. 

Since the matrix A has full column rank, i.e., rank(y4) = L, in order to eliminate the 
equality constraints, we let matrix K G ]R('^+'^)^'^ be any matrix whose range is null space of 
A, with rank(i^) = S, vector x G M.^^^ be a feasible solution for problem (45), i.e.. Ax = c. 



Then we have the parametrization of the afiine feasible set as 

{x|y4x = c} = {Ky + x\y G M^}. 
The eliminated equivalent optimization problem becomes 

minimizej^gKS F{y) = f{Ky + x). (47) 



We next show the Newton primal direction for the constrained problem (45) and un- 



constrained problem (47) are isomorphic, where a feasible solution x for problem (45) is 



mapped to y in problem (47) with Ky + x = x. We start by showing that each Ay in 
the unconstrained problem corresponds uniquely to the Newton direction in the constrained 
problem. 

For the unconstrained problem, the gradient and Hessian are given by 

VF{y) = KVf{Ky + x), V^Fiy) = K'V'^ f\Ky + x)K. (48) 

Note that the objective function / is three times continuously differentiable, which implies 
its Hessian matrix V'^f{Ky + x) is symmetric, and therefore we have V'^F{y) is symmetric, 
i.e., V^F{yy = V'F{y). 
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The Newton direction for problem (47) is given by 

-1 



Ay = - {V'F{y))-'VF{y) = -{K'V'f{x)Kr'K'Vf{ 



X) 



15 



We choose 



w = -{AA')-^A(Vf{x) + V^f(x)Ax) 

and show that (Ax, w) where 

Ax = KAy 



(49) 

(50) 
(51) 



is the unique solution pair for the linear system (|46|) for the constrained problem (|45|). To 

), we use 

We have 



estabhsh the first equation, i.e., V^/(x)Ax + Aw = — V/(x), we use the property that 

K' 

u 



A 



( K'u \ 

\ A ) ~ ^ ^°^ some u G ^^^ implies u = 0,^^ 



K' 
A 



( V^f{x)Ax + A'w + Vf{x) ) 



K'V^f{x)K{-{K'V^f{x)KY^K'Vf{x)) + K'Aw + K'Vf{x) 
AW'^f{x)Ax - Aiyf{x) + VV(a^)Ax) + AV/(x) 

' 




where the first equality follows from definition of Ax, Ay and w [cf. Eqs. (51), (49) and (50)] 
and the second equality follows the fact that K' A'w = for any w^Jj Therefore we conclude 
that the first equation in (46) holds. Since the range of matrix K is the null space of matrix 
A, we have AKy = for all y, therefore the second equation in (|46| holds, i.e., AAx = 0. 



For the converse, given a Newton direction Ax defined as solution to the system (46) for 
the constrained problem (45), we can uniquely recover a vector Ay, such that KAy = Ax. 
This is because AAx = from (46), and hence Ax is in the null space of the matrix A, i.e.. 



column space of the matrix K. The matrix K has full rank, thus there exists a unique Ay. 



Therefore the (primal) Newton directions for problems (47) and (45) are isomorphic under 



the mapping K. In what follows, we perform our analysis for the unconstrained problem (47) 



and then use isomorphic transformations to show the result hold for the equality constrained 



problem (45) 
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^The matrix K\/^f{x)K is invertible. If for some y & K^, we have K'V^f{x)K'y = 0, then 

y'K\/^f{x)K'y — {W'^ f(x))2 K'y — 0, which imphes ||i4r'a::||2 = 0, because the matrix V^/(a;) is strictly 

positive for all x. The rows of the matrix K' span R^ , therefore we have y — 0- This shows that the matrix 
K\7^f{x)K' is invertible. 

^^If K'u = 0, then the vector u is orthogonal to the row space of the matrix K' , and hence column space 
of the matrix K, i.e., null space of the matrix A. If Au = 0, then u is in the null space of the matrix A. 
Hence the vector u belongs to the set nul(A) n (nul(A)) , which implies u — 0. 

^^Let K'A'w = u, then we have ||m||2 = u'K'A'w = w'AKu. Since the range of matrix K is the null space 
of matrix A, we have AKu — for all w, hence ||u||2 = 0, suggesting u ~ Q. 
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Consider the unconstrained problem (45), let Ay denote the exact Newton direction at y 



[cf. Eq. (48)], vector Ay denote any direction in M"^, y{t) = y+tAy and A = y^Ay'V'^F(y)Ay. 
Note that with the isomorphism established earlier, we have A = ^yAy''V^F(y)Ay = 
^yAy'K'V^f{Ky + x)KAy = ^y Ai'V"^ f (x) Ax , where x = Ky + x and Ax = KAy. From 
the assumption in the th eore m, we have A < 1. For any t < 1, {y — y{t))'V'^F{y){y — y{t)) = 
t^X^ < 1 and by Lemma 5.3 for any Zy in M'^, we have 



which implies 



:i - tXyzy'F{y)zy < zy'F{y{t))zy < z'yF{y)zy 

[i-txy 



z'y{V'F{y{t)) - V^F{y))zy < ^ _^ - - l) zy'Fiy)zy 



(52) 



and 



z'yi^'Hy) - V^F(y(t)))^, < (1 - (1 - tXy) zy'Fiy)zy. 
Using the fact that 1 — (1 — tA)^ < j^-^ — 1, the preceding relation can be rewritten as 

[L — tA) 

1 



z'y{V'F{y)-V'F{y{t)))zy 



< 



{1-txy 



1 zy'Fiy)zy. 



(53) 



Combining relations (52) and (53) yields 



|4(V'^(2/) - V'F{y{t)))zy\ < (7^^^ - 1) z'yV'F{y)z, 



(54) 



Since the function F is convex, the Hessian matrix V'^F{y) is positive semidefinite. We 
can therefore apply the generalized Cauchy-Schwarz inequality and obtain 



\{Ay)'{V'F{y{t))-V'F{y)), 



(55) 



< V(Ay)'(V2F(|/(t)) - V^F{y))Ay'^z'yiV^Fiyit)) - V^F^/)); 
1 



< 



[i-txr 

1 

[i-txy 



- 1 y/iAyyV^Fiy)AyJz'V^F{y)zy 



- 1 XJz'y^Fiy)z 



?/' 



where the second inequality follows from relation (54), and the equality follows from defini- 
tion of A. 

Define the function k : M ^ M, as K{t) = VF{y{t)yZy + (1 - t){AyyV'^F{yyZy, then 

d 



dt 



nit) 



\{Ayyv'F{y{t)yZy-{Ayyv'F{y)zy\ = \{Ayy{V'F{y{t))-V'F{y))zy\, 
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which is the left hand side of (55). 



Define 7^ = Ay — Ay, which by the isomorphism, imphes 7 = Ax — Ax = K^y. By 
rewriting Ay = Ay — 7^ and observing the exact Newton direction Ay satisfies Ay = 
—'V'^F{y)~^'VF{y) [cf. Eq. (48)] and hence by symmetry of the matrix V'^F{y), we have 
AyV^Fiy) = AyV^Fiy)' = -VF(y)', we obtain 

k{0) = VFiyYzy + (Ayyv'Fiyyzy = VF{y)'zy - VF{y)'zy - i'yV^F{y)zy = -^'yV^F{y)zy. 

Hence by integration, we obtain the bound 

rt 



nit)\<\Jz'V^Fiy)z, 



y 

i-\t 







y/ W. T^o l]ds + \^'yF{y)zy\ 



(1 - sxy 



z'V^F{y)zy + WV^F{y)zy\. 



For t = 1, y{t) = y + Ay, above equation imphes 



|ft:(l)| =\VFiy + AyyZy\ < 



A2 



''V^F{y)zy + WV'F{y)zy\. 



1-X^ ' ^"^ ^ "^ 



We now specify Zy to be the exact Newton direction at y + Ay, then Zy satisfies z'yV'^F{y + 
Ay)zy = \VF{y + AyYzyl, by using the definition of Newton direction at y + Ay [cf. Eq. 



(49)], which proves 



ZyV^F{y + Ay)zy < 



X' 



1-X 



z'V^F{y)zy + \jy'F{yyZy\. 



We now use the isomorphism once more to transform the above relation to the equality 
constrained problem domain. We have z = Kzy, the exact Newton direction at x + Ax = 
X + Ky + KAy. The left hand side becomes 

z'yV^F{y + Ay)zy = z'yK'V^ f\x + A5:)Kzy = z'V^f{x + Ax)z. 

Similarly, we have the right hand sand satisfies 



A^ 



1-A 



z'V^F{y)zy + WyV'FiyYz 



X' 



1-A 
A2 



z'yK'V^f{x)Kzy + \iyK'W'f{x)Kz 



1-A 



^^z'V^f{x)z+ \-i'V^f{x)'z\. 



U 



By combining the above two relations, we have established the desired relation. 

One possible matrix K in the above proof for problem (4) is given by i^ = j „ 
whose corresponding unconstrained domain consists of the source rate variables. In the 
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unconstrained domain, the source rates are updated and then the matrix K adjusts the slack 
variables accordingly to maintain the feasibility, which coincides with our inexact distributed 
algorithm in the primal domain. The above lemma will be used to guarantee quadratic rate 



of convergence for the distributed inexact Newton method (29)]. The next lemma plays a 
central role in relating the suboptimality gap in the objective function value to the exact 
Newton decrement (see [^ for more details). 

Lemma 5.5. Let F : M" — )■ M be a self- concordant function. Consider solving the uncon- 
strained optimization problem 

minimize^^gKn F(x), (56) 

using an (unconstrained) Newton method. Let Ax be the exact Newton direction at x, 
i.e.. Ax = — V^-F(x)~"^V-F(x). Let A(x) be the exact Newton decrement, i.e., A(x) 



■\/(Ax)'V^-F(x)Ax. Let F* denote the optimal value of problem (56). If A(x) < 0.68, 
then we have 

F* > F{x) - A(x)2. (57) 



Using the same elimination technique and isomorphism established for Lemma 5^, the 
next result follows immediately. 

Lemma 5.6. Let / : M" — )■ M be a self-concordant function. Consider solving the equality 
constrained optimization problem 

minimize /(x) (58) 

subject to Ax = c, 

using a constrained Newton method with feasible initialization. Let Ax be the exact (primal) 
Newton direction at x, i.e.. Ax solves the system 

VV(x) A'\fAx\^_f V/(x) 
A J \ w J \ 

Let A(x) be the exact Newton decrement, i.e., A(x) = a/(Ax)'V^/(x)Ax. Let /* denote the 



optimal value of problem (58). If A (x) < 0.68, then we have 

r > fix) - A(x)^ (59) 

Note that the relation on the suboptimality gap in the preceding lemma holds when the 
exact Newton decrement is sufficiently small (provided by the numerical bound 0.68, see [H])- 
We will use these lemmas in the subsequent sections for the convergence rate analysis of the 
distributed inexact Newton method applied to problem (111). Our analysis comprises of two 
parts: The first part is the damped convergent phase, in which we provide a lower bound on 
the improvement in the objective function value at each step by a constant. The second part 
is the quadratically convergent phase, in which the suboptimality in the objective function 
value diminishes quadratically to an error level. 
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5.2.2 Basic Relations 

We first introduce some key relations, wtiicli provides a bound on tlie error in the Newton 
direction computation. This will be used for both phases of the convergence analysis. 

Lemma 5.7. Let {x'^} be the primal sequence generated by the inexact Newton method 
teoj). Let A(x'^) be the inexact Newton decrement at x^ [cf. Eq. (31)]. For all k, we have 



where 7^^, p, and e are nonnegative scalars defined in Assumption [2] 

Proof. By Assumption ol the Hessian matrix V^/(x'^) is positive definite for all x''. We 
therefore can apply the generalized Cauchy-Schwarz inequality and obtain 

\{j'')V^f{x^)Ax''\ < V((7'=)'V2/(a^'=)7'=)((A5'=)'VV(a;'=)A5'=) (60) 

< ^(p2A(xfc)2 + e)A(xfc)2 

< \l{p^\{x^Y + e + 2p\{x^)^e)\{x^f, 



where the second inequality follows from Assumption^ and definition of A(x'^), and the third 
inequality follows by adding the nonnegative term 2p^Je\{x^Y ^"^ ^^e right hand side. By 



the nonnegativity of the inexact Newton decrement A(x'^), it can be seen that relation (60) 
implies 

|(7')'VV(a;')Ax'^| < \{x^){p\{x^) + v^) = pXx^'f + A(x'=)v^, 

which proves the desired relation. D 

Using the preceding lemma, the following basic relation can be established, which will be 
used to measure the improvement in the objective function value. 

Lemma 5.8. Let {x'^} be the primal sequence generated by the inexact Newton method 
( |29[ ). Let fk be the objective function along the Newton direction and A(x'^') be the inexact 
Newton decrement [cf. Eqs. (42) and (3lj)] at x^ respectively. For all k with < t < 1/A(x'^), 



we have 

hit) < fk{0) - t(l - p)~Xi^'y - (1 - V~e)t~Xix'') - log(l - tXix")), (61) 

where p, and e are the nonnegative scalars defined in Assumption [2j 

Proof. Recall that Ax'^ is the exact Newton direction, which solves the system (^. Therefore 
for some w'^, the following equation is satisfied, 

V^f{x^)Ax'' + AV = -Vf{x^). 

By left multiplying the above relation by (Ax^)', we obtain 

(Ax'=)W(a;'')Ax'= + {Ax'^yA'w'' = -(A5'=)V/(x^). 
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Using the facts that Ax = Ax + 7 from Assumption [2j and AAx = by the design of 
our algorithm, the above relation yields 

{Ax^yV^f{x)Ax'' + (Ax^)'VV(x'=)7' = -(A£'=)'V/(x'^). 



By Lemma 5.7, we can bound (Ax^)'V^/(x'^)7'^ by, 

pA(x^)' + A(x'')v^ > (Ax'=)'V'/(a;')7'' > -pA(x^)' - A(x^)v^. 



Using the definition of A(x'^) [cf. Eq. (31)] and the preceding two relations, we obtain the 
following bounds on (Ax^)'V/(x'^): 

-(1 + p)A(x'=)2 - A(x'=)v^ < (A5'=)'V/(x*^) < -(1 - p)~X{x'Y + A(x^)v^. 

By differentiating the function fk(t), and using the preceding relation, this yields, 

m = V fix')' Ax' 



(62) 



Moreover, we have 



<-(l-p)A(x'=)2 + A(x'^)v^. 

^'(0) = {Ax'yv'f{x')Ax' 
= A(x^)2. 



(63) 



The function fk{t) is self-concordant for all k, therefore by Lemma 5.2, for < t < 1/A(x'^), 
the following relations hold: 

fk{t) < Mo) + tfUo) - tfm'^ - iog(i - tfj!{o)'^) 

< /fe(0) - t(l - j9)A(x'=)2 + tA(x'=)v^ - tA(x'=) - log(l - tA(x'=)) 
= fk{0) - t(l - p)A(x^)2 - (1 - v^)tA(x^) - log(l - tA(x^')), 



where the second inequality follows by Eqs. (62) and (63). This proves Eq. (61) 



D 



The preceding lemma shows that a careful choice of the stepsize t can guarantee a constant 
lower bound on the improvement in the objective function value at each iteration. We present 
the convergence properties of our algorithm in the following two sections. 

5.2.3 Damped Convergent Phase 

In this section, we consider the case when 9'' >V and stepsize d'' = p^ [cf. Eq. (32)]. We 



will provide a constant lower bound on the improvement in the objective function value in 
this case. To this end, we first establish the improvement bound for the exact stepsize choice 
oft = l/(A(x'=) + l). 
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Theorem 5.9. Let {x'^} be the primal sequence generated by the inexact Newton method 
(|29|. Let fk be the objective function along the Newton direction and A(a;*'') be the inexact 
Newton decrement at x'^ [cf. Eqs. (42) and (31)]. Consider the scalars p and e defined in 

Assumption 2 and assume that < p < ^ and < e < ( ^ ' ~^'^ — , ~ "*" ~ ^ ) , where b is the 

con.... J. U. .ep..e .,e [cf. E, §,. Fo. 0^ > . and < ^ 1/ (i*, . l), U..e 

exists a scalar a > such that 



fk{t) - h{0) <-«(!+ p) [^ ^ 



1 + 



2Vb-V + b-l 



(64) 



Proof. For notational simplicity, let y = A (x^) in this proof. We will show that for any 



positive scalar a with < a < ( | 



such a exists since e < 



(0.5-p){2Vb-V+b-l) 



2 P - (2yw6-i) ) / (P + 1)> Eq. (64) holds. Note that 

2 



By Assumption [SJ we have for O'^ >V 



l-A,.v^>_v-l-A,.v^.-^^^\^. 



(65) 



Using b > ^±^, we have y > V - {I - l) {1 + V) > 0, which implies 21^6 - 1/ + 6 - 1 > 0. 



Together with < a < ( ^ 



P 



V^b 



2Vb-V+b- 



— j / (p + 1) and 6 > ^7^, this shows 



^^2^6-^ + 6-1/1 ,^ , , 



Combining the above, we obtain 



1 



\^<y{^-p-a{l+p) 



which using algebraic manipulation yields 



y 



:i-p)y-{l-Ve) + il + y)-^<-ail+p)y. 



From Eq. (|65|), we have y > 0. We can therefore multiply by y and divide by 1 + y both 

'l+p)y^ 



sides of the above inequality to obtain 



1-P 2 

-y 



1-v^ 



y + y 



y 



< -a- 



l + y" l + y " ■ " 2(l + y) - " 1 + y 

Using second order Taylor expansion on log (1 + y), we have ioi y > 



(66) 



log (l + y) <y 



y^ 



2(1 + 2/)' 



32 



Using this relation in Eq. (66) yields, 

-y + log {l+y)< -a 



1-p 2 

y 



l + y 
Substituting the value of t - 



l+y i+y 

1/ {y + 1), the above relation can be rewritten as 



- (1 - p) ty^ -{1-V~e)ty- log (1 - ty) < 



-a- 



l + y 



Using Eq. (61) from Lemma 5.8 and definition of y in the preceding, we obtain 

,2 



fk{t)-fk{0)<-a{l+p) 



y 



y+l 



Observe that the function h (y) = ^^ is monotonically increasing in y, and for 6'^ > V hj 



relation (65) we have y > ?Yb_v±b_J,^ Therefore 

■2 /2Vb-V + b 



-a{l+p) 



y 



< 



-a 



;i+p)(- 



1 



y+l ----■-' y 5 

Combining the preceding two relations completes the proof. 



/ 1 + 



2Vb-V + b-l 



n 



Note that our algorithm uses the stepsize d'' = p^ in the damped convergent phase, 
which is an approximation to the stepsize t = l/(A(a;*^) + 1) used in the previous theorem. 



The error between the two is bounded by relation (36) as shown in Lemma 4.5, We next 



show that with this error in the stepsize computation, the improvement in the objective 
function value in the inexact algorithm is still lower bounded at each iteration. 
Let /3 = ^, where t = l/(A(x'^) + 1). By the convexity of /, we have 

fix'' + ptAx'') = /(/3(x'= + tAx^') + (1 - /3)(x'=)) < /3/(x'= + tAx^') + (1 - l3)fix''). 

Therefore the objective function value improvement is bounded by 

fix + ptAx') - fix') < Pfix' + tAx') + (1 - P)fix') - fix') 

= Pifix' + tAx')-fix')) 

= Pifkit) - fkiO)), 



where the last equality follows from the definition of fkit)- Using Lemma 
bounds on /3 as 26 — 1 < /3 < 1. Hence combining this bound with Theorem 

'2Vb~V+b-l\'^ 

fix'+^)-fix')<-i2b-l)ail+p) 



4.5 



5.9 



( = 



'-y 



(1+ 



2Vb-V+b-l 



y 



we obtain 
we obtain 

(67) 



Hence in the damped convergent phase we can guarantee a lower bound on the object function 
value improvement at each iteration. This bound is monotone in b, i.e., the closer the scalar 
b is to 1, the faster the objective function value improves, however this also requires the 
error in the inexact Newton decrement calculation, i.e., Xix') — 6', to diminish to [cf. 
Assumption [s] . 
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5.2.4 Quadratically Convergent Phase 



18 



We 



In this phase, there exists k with 6^ < V and the step size choice is d^ = 1 for all k >k. 
show that the suboptimality in the primal objective function value diminishes quadratically 
to a neighborhood of optimal solution. We proceed by first establishing the following lemma 
for relating the exact and the inexact Newton decrements. 

Lemma 5.10. Let {x''} be the primal sequence generated by the inexact Newton method 
( [29 ) and \(x''), \{x^) be the exact and inexact Newton decrements at x'^ [cf. Eqs. (30) and 
([31)]. Let p and e be the nonnegative scalars defined in Assumption [2] We have 

(68) 



(1 -pyXix") - v^ < X{x') < [l+pyXix") + V~e- 



Proof. By Assumption ol for all k, V^/(x'^) is positive definite. We therefore can apply the 
generalized Cauchy-Schwarz inequality and obtain 



x^) 



(69) 



|(Aa;*^)'VV(x'=)A£^| < ^/{{Ax^yV^f{x'')Ax>'){{Ax''yv^f{x'')A 
= A(x^)A(x'=), 



where the equality follows from definition of A(x'^) and A(x'^). Note that by Assumption [2 
we have Ax'^ = Ax^ + 7*^, and hence 



|(Ax'=)'VV(a;'^)A5l = |(A5^ + 7^)W(x'=)Ax^| 

> {Ax^yV^f{x^)A5:'' - \{-i''yv^ f{x^)Ai''\ 
yXix^'f - p\{x^f -\{x'')^e, 



(70) 



where the first inequality follows from a variation of triangle inequality, and the last inequality 



follows from Lemma 5.8 Combining the two inequalities (69) and (70), we obtain 

A(x^)A(a;'=) > \{x''f - pXix^'f - v^A(a;'=), 
By canceling the nonnegative term \{x^) on both sides, we have 

A(x^') > A(x^)-pA(x'=)- v^. 



This shows the first half of the relation (68). For the second half, using the definition of 
A(x'^), we have 

\{x^f = (Ax'yvV(a;')Ax'- 

= {Ai'' + 7'=) 'VV(a;^')(A5'= + 7^) 

= {Ai''yv^f{x^)Ai'' + {i^yV^f{x^)-i^ + 2(Ax^) 'VV(a;'')7^ 
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Note that once the condition 6**^ < T^ is satisfied, in all the following iterations, we have stepsize d'^ = 1 
and no longer need to compute 9^ . 
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where the second equahty follows from the defin ition of 7^^ [cf. Eq. (33)]. By using the 



definition of A(a; ), Assumption ^ and Lemma 



5.7, the preceding relation implies, 



Xix'^f < \{x^f + p'~X{x''f + e + 2pX{x''f + 2^e\{x^ 



< \{x^f + p^Xix^'f + 2pA(a;^)2 + 2v/^(l 



p)\{x^ 



where the second inequality follows by adding a nonnegative term of 2^/ep\{x^) to the right 
hand side. By nonnegativity of p, e, A and \{x^), we can take the square root of both sides 
and this completes the proof for relation (68). D 



Before proceeding to establish quadratic convergence in terms of the primal iterations to 
an error neighborhood of the optimal solution, we need to impose the following bound on 
the errors in our algorithm in this phase. Recall that k is an index such that 9^ <V and 
(1^ = 1 for all k >k. 

Assumption 4. Let {x^} be the primal sequence generated by the inexact Newton method 
(29). Let be a positive scalar with < 0.267. Let ^ and v be nonnegative scalars defined 
in terms of 6 as 



e 



+ v^ 20v^ + e 



1 



1 -p- (p- y/e {1 - p - (f) - ^/ey (1 -p- 0- A/e)2' 

where p and e are the scalars defined in Assumption [2j The following relations hold 

{l+p){e' + T') + V~e<<P, (71) 

t;(0.68)2 + ^<0.68, (72) 

^^ < 1, (73) 

1 — P 

p + V^<l-(4(/)2)i-0, (74) 

where r*^ > is a bound on the error in the Newton decrement calculation at step k [cf. 
Assumption [3] . 



The upper bound of 0.267 on cj) is necessary here to guarantee relation ( 74 ) can be satisfied 



by some nonnegative scalars p and e. Relation (71) can be satisfied by some nonnegative 
scalars p, e and r^, because we have 9^ < V < 0.267. Relation (71) and (72) will be used 



to guarantee the condition A(x ) < 0.68 is satisfied throughout this phase, so that we can 



use Lemma 5.6 to relate the suboptimality bound with the Newton decrement, and relation 



(73) and (74) will be used for establishing the quadratic rate of convergence of the objective 



function value, as we will show in the Theorem 5.12 This assumption can be satisfied by 
first choosing proper values for the scalars p, e and r such that all the relations are satisfied, 
and then adapt both the consensus algorithm for 9'' and the dual iterations for w'' according 
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to the desired precision (see the discussions following Assumption |2] and [3] for how these 
precision levels can be achieved). 

To show the quadratic rate of convergence for the primal iterations, we need the following 
lemma, which relates the exact Newton decrement at the current and the next step. 

Lemma 5.11. Let {x'^} be the primal sequence generated by the inexact Newton method 



. Eqs. (|30|) and 
hold. 



'hen for 



([29 ) and A(a;'^), \{x^) be the exact and inexact Newton decrements at x'' c 
([31)]. Let 6*^ be the computed inexact value of A(x'^) and let Assumption 4 
all k with X{x'') < 1, we have 

\{x''+^) < v\{x^Y + ^, (75) 



where ^ and v are the scalars defined in Assumption |4] and p and e are defined as in Assump- 
tion El 



Proof. Given A(x ) < 1, we can apply Lemma 5.4 by letting z = Ax ~^ , we have 



X{x''+^f = (Ax^+i)'V/2(x + Ax)Ax^+i 
X{x^f 



< 



1 - A(x^) 
~X{x''f 



v/(Aa;'=+i)'V2/(x)Ax'=+i + |(7'=)'VV(x)'Aa;'=+M 



< — ^^^v^(Ax'=+i)'V2/(a;)Aa;^'+i + ^y{-f''yV^f{x)-f>'^/{Ax^+^yV^f{x)Ax''+\ 
1 — A(a;^) 

where the last inequality follows from the generalized Cauchy-Schwarz inequality. Using 
Assumption [2| the above relation implies 



A(x 



k+l\2 



< 



\{x 



k\2 



+ Jp^\{x^y + e ^{Ax^+^yW^f{x)Ax^+\ 



1 - A(x^) 

By the fact that A(x^) < 6'*'' + r < < 1, we can apply Lemma 
1 / A(x'=)2 



5.3 



and obtain, 



A(x'=+i)^ < 



1-A(x'=) Vl-A(x^) 
\{x''f 



P 



2A(x'=)2 + e V(Ax'=+i)'V2/(a; + Ax)Ax''+^ 



2\(rr.k\2 



p'^\{x 



(1 - A(x'=))2 1 - A(x^O 

By dividing the last line by \{x^^^), this yields 



A (a; 



fe+i> 



A(a;'=+^) < 



A(x 



k\2 



+ 



p^\{x 



k\2 



% A(x 



A:-\2 



pA(x^) + y/e 
(l-A(x^O)^ ' 1-A(x'=) ~(1-A(x'^))2 1-A(x'=) 
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From Eq. (68), we have A(x^) < ]^_ ■ Therefore the above relation imphes 



X{x 



fc+l^ 



< 



A(x'=) + v/i 



1 — p — A(x*^) 



p\{x'') + y/e 
1 — p — \{x'^) — 



By Eq. (77), we have X{x'^) < (p, and therefore the above relation can be relaxed to 

2 



X{x 



fc+l^ 



< 



X{x' 



+ 



1 -p- (j)- y^ej l-p-0-A^ 

Hence, by definition of ^ and v, we have 

X{x''+^) < vXix'^y + ^. 



+ 



20v/e + e 



(1-p 



y^)^ 



D 



In the next theorem, building upon the preceding lemma, we apply relation (59 ) to bound 
the suboptimality in our algorithm, i.e., f{x^) — /*, using the exact Newton decrement. 
We show that under the above assumption, the objective function value /(x'^) generated 
by our algorithm converges quadratically in terms of the primal iterations to an explicitly 
characterized error neighborhood of the optimal value /*. 

heorem 5.12. Let {x^} be the primal sequence generated by the inexact Newton method 



(29) and X{x^), X{x'') be the exact and inexact Newton decrements at x^ [cf. Eqs. (30) and 



(31)]. Let /(x'^) be the corresponding objective function value at k^'^ iteration and /* denote 
the optimal objective function value for problem dll). Let Assumption 111 hold, and ^ and v 
be the scalars defined in Assumption 111 Assume that for some 6 G [0, 1/2), 



e + << 



Av' 



Then for all ?7i > 1, we have 



X{x 



k+m\ 



< 



)2'", 



+ e + 



62 



2^-1 



-1 



)2'" 



(76) 



and 



limsup„^^/(a;^+™)-r <e + 
where k is the iteration index with 6'^ < V. 



2v'' 



Proof. We prove Eq. (76) by induction. First for m = 1, from Assumption pi we have 
X{x'') < 6^ + T^. Relation (71) implies 9'^ + r^ < cj) < 1, hence we have Xix'^) < 1 and we 



can apply Lemma 5.11 and obtain 



k\2 



x{x'+')<vx{xy + i 
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By Assumption lil and Eq. (68), we have 



The above two relations imply 



(77) 



The right hand side is monotonically increasing in 0. Since cj) < 0.68, we have by Eq. (72), 
X{x'^'^^) < 0.68. By relation (74), we obtain [1 — p — (p — -^/e)^ > 40^. Using the definition 
of f , i.e., V 



(l_p_<^_^)2, 



the above relation implies vcp"^ < j-. Hence we have 



This establishes relation (76) 



A(x^+i) 
br TTi = 1. 



^^ + «- 



We next assume that Eq. (76) holds and X(x ~^"^) < 0.68 for some ?n > 0, and show that 



these also hold for m + 1. From Eqs. (68) and (73), we have 



A(x^+-) < ^(^ ) + v^ < 0-68 + v^ < 1, 
1 — p 1 — p 
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where in the second inequality we used the inductive hypothesis that A(a;'^~^™) < 0.68. Hence 



we can apply Eq. ( 75 ) and obtain 



using Eq. (72) and X{x''~^"^) < 0.68 once more, we have X{x^~^'^'^^) < 0.68. From our inductive 



hypothesis that ( 76 ) holds for m, the above relation also implies 

2 



A(a; 



fc+m+l^ 



<V 



1 



+ e + 



62 



2^-1 



1 



20m 

1 , e 



e 



V 22" 

6 22"'-i - 1 / 6 2^""-' -1 

2m+i I 92™-l ,, o2"»+i-l ~'~ '^ I ? "I" 



'v 22 



t; 22 



t> 2^ 



Using algebraic manipulations and the assumption that ^ + v^ < £;, this yields 

6 22'"+'-! - 1 



X{x 



k+m+l\ 



< 



— +e + 



V 2^ 



completing the induction and therefore the proof of relation (76). 
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Note that we do not need monotonicity in A(x'^), instead the error level assumption from relation (73) 



enables us to use Lemma 5.11 to establish quadratic rate of convergence. 
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The induction proof above suggests that the condition X{x^~^'^) < 0.68 holds for all 



m > 0, we can therefore apply Lemma [5.6[ and obtain an upper bound on suboptimality as 
follows, 



/(x^+'") -f* < (X{x~^+"')y < A(x^+"). 



Combining this with Eq. (76), we obtain 



1 A o2'"-l _ 1 

fix'+n - r < -^ + e + — ^^.;^. 

Taking limit superior on both sides of the preceding relation establishes the final result. D 

The above theorem shows that the objective function value f{x'') generated by our al- 
gorithm converges in terms of the primal iterations quadratically to a neighborhood of the 
optimal value /*, with the neighborhood of size ^ + ^, where 

0p+v^ 20v/i + e 1 



and the condition $, + v^ < ^ is satisfied. Note that with the exact Newton algorithm, we 
have p = e = 0, which implies ^ = and we can choose 6 = 0, which in turn leads to the size 
of the error neighborhood being 0. This confirms the fact that the exact Newton algorithm 
converges quadratically to the optimal objective function value. 

5.3 Convergence with respect to Design Parameter fi 

In the preceding development, we have restricted our attention to develop an algorithm for 
a given logarithmic barrier coefficient /i. We next study the convergence property of the 
optimal object function value as a function of /i, in order to develop a method to bound the 
error introduced by the logarithmic barrier functions to be arbitrarily small. We utilize the 
following result from |30j . 



Lemma 5.13. Let G be a closed convex domain, and function ghe a self-concordant barrier 
function for G, then for any x, y in interior of G, we have {y — xyVglx) < 1. 

Using this lemma and an argument similar to that in [30j, we can establish the following 
result, which bounds the sub-optimahty as a function of fj,. 

Theorem 5.14. Given yU > 0, let x(/i) denote the optimal solution of problem dll) and 
h{x{fi)) = X]j=i ~Ui{xi{fi)) . Similarly, let x* denote the optimal solution of problem (bj) 
together with corresponding slack variables (defined in Eq. (pi)), and h* = J2i=i ~Ui{x*). 
Then, the following relation holds, 

h{x{fi)) — h* < jji. 
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Proof. For notational simplicity, we write g(x) = — X]i=i log(^i)- Therefore the objective 
function for problem (El) can be written as h{x) + ixg{x). By Assumption QJ we have that the 
utility functions are concave, therefore the negative objective functions in the minimization 
problems are convex. From convexity, we obtain 

h{x*) > h{x{fx)) + {x* - x{fx)yVh{x{fj,)). (78) 

By optimality condition for x(/i) for problem ^ for a given /i, we have, 

(V/i(x(/i)) + i2Vgix{fi))yix - x(/i)) > 0, 

for any feasible x. Since x* is feasible, we have 

(V/i(x(/i)) + /iV^(x(/i)))'(x* - x(/i)) > 0, 

which implies 

Vh{x{iJi))'{x* — x{fi)) > — jiV g{x{iji))' {x* — x{jj,)). 



we 



For any /i, we have x(/i) belong to the interior of the feasible set, and by Lemma 5.13 
have for all /i, V(y'(x(/i))'(x(/i) — x(/i)) < 1. By continuity of x(/i) and the fact that the 
convex set Ax < c is closed, for A and c defined in problem (111), we have x* = liia^^Q x{fj,), 
and hence 

Vg{x{fi)y{x* — x{fi)) = lim Vg{x{fi)y{x{fl) — x{ji)) < 1. 

The preceding two relations imply 

Vh{x{n)y{x* -x{fi)) > -fi. 



In view of relation (78), this establishes the desired result, i.e. 

h{x{jj)) — h* < jj,. 



U 



By using the above theorem, we can develop a method to bound the sub-optimality 
between the objective function value our algorithm provides for problem ^ and the exact 
optimal objective function value for problem ([2]), i.e, the sub-optimality introduced by the 
barrier functions in the objective function, such that for any positive scalar a, the following 
relation holds, 

h_mizK < «, (79) 

where the value h(x{n)) is the value obtained from our algorithm for problem (111), and 
h* is the optimal objective function value for problem ([2]). We achieve the above bound 
by implementing our algorithm twice. The first time involves running the algorithm for 
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problem (0) with some arbitrary jj,. This lead s to a sequence of x^ converging to some x(/i). 
Let /i(x(/i)) = X]i=i ^Ui{xi{ji)). By Theorem 5.14, we have 



h{x{^i)) - ^i < h\ (80) 

Let scalar M be such that M = {a[h{x{n)) — /i])~^ and implement the algorithm one more 
time for problem (ll]), with /x = 1 and the objective function multiplied by M, i.e., the 
new objective is to minimize — M ^^^^ Ui{xi) — ^^^i log (xj), subject to link capacity con- 
straints^^ We obtain a sequence of x^ converges to some x{l). Denote the objective function 



value as h{x{l)), then by applying the preceding theorem one more time we have 

Mh{x{l)) - Mh* </i = 1, 

which implies 

h{x{l)) -h* < a[h (a;(/i)) - yu] < ah* 

where the first inequality follows by definition of the positive scalar M and the second 



inequality follows from relation (80). Hence we have the desired bound (79). 

Therefore even with the introduction of the logarithmic barrier function, the relative 
error in the objective function value can be bounded by an arbitrarily small positive scalar 
at the cost of performing the fast Newton-type algorithm twice. 

6 Simulation Results 

Our simulation results demonstrate that the decentralized Newton method significantly out- 
performs the existing methods in terms of number of iterations. For our distributed Newton 
method, we used the followingerror tolerance levels: p = 10~^, e = 10~^ [cf. Assumption 
^, r = 10~^ [cf. Assumption^ and when 9'^ > V = 0.12 we switch stepsize choice to be 
ff^ = 1 for all k > k. With these error tolerance levels, both Assumptions |2] and |4| can be 
satisfied. We executed distributed Newton method twice with different scaling and barrier 



coefficients according to Section 5.3 with B = 10~^ to confine the error in the objective 
function value to be within 1% of the optimal value. For a comprehensive comparison, we 
count both the primal and dual iterations implemented through distributed error checking 



method described in Appendix BJp In particular, in what follows, the number of iterations 



of our method refers to the sum of dual iterations at each of the generated primal iterate. 

^'^When M < 0, we can simply add a constant to the original objective function to shift it upward. 
Therefore the scalar M can be assumed to be positive without loss of generality. If no estimate on M is 
available apriori, we can implement the distributed algorithm one more time in the beginning to obtain an 
estimate to generate the constant accordingly. 

^"'^In these simulations we did not include the number of steps required to compute the stepsize (distributed 
summation with finite termination) and to implement distributed error checking (maximum consensus) to 
allow the possibilities that other methods can be used to compute these. Note that the number of iterations 
required by both of these computation is upper bounded by the number of sources, which is a small constant 
(8 for example) in our simulations. 
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Sample Utility Value Distributed Newton 



80 100 120 140 160 180 200 

Iteration 



Figure 9: One sample objective function value of distributed Newton method against number 
of iterations. The dotted black lines denote ±5% interval of the optimal objective function 
value. 



In the simulation results, we compare our distributed Newton method performance against 
both the subgradient method used in [25] and the Newton-type diagonal scaling dual method 
developed in |T|. Both of these methods were implemented using a constant stepsize that 
can guarantee convergence as shown in [22] and [T]. 

A sample evolution of the objective function value of the distributed Newton method 
is presented in Figure |9| This is generated for the network in Figure [Tj The horizontal 
line segments correspond to the dual iterations, where the primal vector stays constant, and 
each jump in the figure is a primal Newton update. The spike close to the end is a result of 
rescaling and using a new barrier coefficient in the second round of the distributed Newton 



algorithm [cf. Section 5.3 . The black dotted lines indicate ±5% interval around the optimal 
objective function value. 

The other two algorithms were implemented for the same problem, and the objective 



function values are plotted in Figure 10, with logarithmic scaled iteration count on the 
X-axis. We use black dotted lines to indicate ±5% interval around the optimal objective 
function value. While the subgradient and diagonal scaling methods have similar convergence 
behavior, the distributed Newton method significantly outperforms the two. 

One of the important features of the distributed Newton method is that, unlike the other 
two algorithms, the generated primal iterates satisfy the link capacity constraint throughout 



the algorithm. This observation is confirmed by Figure 11, where the minimal slacks in links 
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Figure 10: One sample objective function value of all three methods against log scaled 
iteration count. The dotted black lines denote ±5% interval of the optimal objective function 
value. 
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Figure 11: Sample minimal slack in link capacity of all three methods against log scaled 
iteration count. Negative slack means violating capacity constraint. The dotted black line 
denotes 0. 



44 



Number of Iterations for Three Algorithms 50 Random Trials 
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Figure 12: Log scaled iteration count for the 3 methods implemented over 50 randomly 
generated networks. 



are shown for all three algorithms. The black dotted line is the zero line and a negative slack 
means violating the capacity constraint. The slacks that our distributed Newton method 
yields always stays above the zero line, while the other two only becomes feasible in the end. 
To test the performances of the methods over general networks, we generated 50 random 
networks, with number of links L = 15 and number of sources S" = 8. Each routing matrix 
consists oi L X R Bernoulli random variables^^ All three methods are implemented over the 



50 networks. We record the number of iterations upon termination for all 3 methods, and 



results are shown in Figure [12] on a log scale. The mean number of iterations to convergence 
from the 50 trials is 924 for distributed Newton method, 20286 for Newton-type diagonal 
scaling and 29315 for subgradient method. 



7 Conclusions 

This paper develops a distributed Newton-type second order algorithm for network utility 
maximization problems, which can achieve superlinear convergence rate in primal iterates 
within some error neighborhood. We show that the computation of the dual Newton step 
can be implemented in a decentralized manner using a matrix splitting scheme. The key 



^^When there exists a source that does not use any links or a link that is not used by any sources, we 
discard the routing matrix and generate another one. 
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feature of this scheme is that its implementation uses an information exchange mechanism 
similar to that involved in first order methods applied to this problem. We show that even 
when the Newton direction and stepsize are computed with some error, the method achieves 
superlinear convergence rate in terms of primal iterations to an error neighborhood. Simula- 
tion results also indicate significant improvement over traditional distributed algorithms for 
network utility maximization problems. Possible future directions include a more detailed 
analysis of the relationship between the rate of convergence of the dual iterations and the 
underlying topology of the network and investigating convergence properties for a fixed finite 
truncation of dual iterations. 

A Distributed Stepsize Computation 

In this section, we describe a distributed procedure with finite termination to compute 



stepsize d^ according to Eq. (32). We first note that in Eq. (32), the scalar b G (0,1) is 
predetermined and the only unknown term is the inexact Newton decrement X{x''). In order 
to compute the value of X{x''), we rewrite the inexact Newton decrement based on definition 



(31) as A(x'=) = yY.ies(^^iy(^k)ii + Y^iec(^^'i+sy(Hk)(i+s)(i+s), or equivalently, 

{X{x')y = 5^(A5f)2(i7,),, + 5^(Axt^)2(i7,)a+5)a+5). (81) 

In the sequel, we develop a distributed summation procedure to compute this quantity 
by aggregating the local information available on sources and links. A key feature of this 
procedure is that it respects the simple information exchange mechanism used by first order 
methods applied to the NUM problem: information about the links along the routes is 
aggregated and sent back to the sources using a feedback mechanism. Over-counting is 
avoided using a novel off-line construction, which forms an (undirected) auxiliary graph that 
contains information on sources sharing common links. 

Given a network with source set iS = {1, 2, . . . , S"} (each associated with a predetermined 
route) and link set £ = {1, 2, . . . , L}, we define the set of nodes in the auxiliary graph as the 
set S, i.e., each node corresponds to a source (or equivalently, a flow) in the original network. 
The edges are formed between sources that share common links according to the following 
iterative construction. In this construction, each source is equipped with a state (or color) 
and each link is equipped with a set (a subset of sources), which are updated using signals 
sent by the sources along their routes. 
Auxiliary Graph Construction: 

• Initialization: Each link / is associated with a set G; = 0. One arbitrarily chosen source 
is marked as grey, and the rest are marked as white. The grey source sends a signal 
{label, i} to its route. Each link / receiving the signal, i.e., / G L{i), adds i to G^. 
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Iteration: In each iteration, first the sources update their states and send out signals 
according to step (A). Each hnk I then receives signals sent in step (A) from the sources 
i G S{1) and updates the set 6; according to step (B). 

(A) Each source i: 

(A. a) If it is white, it sums up |G/| along its route, using the value \Qi\ from the 
previous time. 
(A.a.l) If X]zeL(j) l®'l ^ 0' then the source i is marked grey and it sends two 

signals {neighbor, i} and {label, i} to its route. 
(A. a. 2) Else, i.e., J^ieUi) I®' I ~ '-'' source i does nothing for this iteration. 
(A. a) Otherwise, i.e., it is grey, source i does nothing. 

(B) Each link /: 
(B.a) If 9; = 0: 

(B.a.l) If it experiences signal {label, i} passing through it, it adds i to 9/. When 
there are more than one such signals during the same iteration, only the 
smallest i is added. The signal keeps traversing the rest of its route. 

(B.a. 2) Otherwise link / simply carries the signal(s) passing through it, if any, to 
the next link or node. 
(B.b) Else, i.e., 9; ^ 0: 

(B.b.l) If it experiences signal {neighbor, i} passing through it, an edge (i,j) 
with label Li is added to the auxiliary graph for all j e 9;, and then i 
is added to the set 9/. If there are more than one such signals during 
the same iteration, the sources are added sequentially, and the resulting 
nodes in the set 9^ form a clique in the auxiliary graph. Link I then stops 
the signal, i.e., it does not pass the signals to the next link or node. 

(B.b. 2) Otherwise link / simply carries the signal(s) passing through it, if any, to 
the next link or node. 

Termination: Terminate after S — 1 number of iterations. 



The auxiliary graph construction process for the sample network in Figure 13 is illustrated 
in Figure [14| where the left column reflects the color of the nodes in the original network 
and the elements of the set 9; (labeled on each link /), while the right column corresponds 



to the auxiliary graph constructed after each iteration, ^^ 

We next investigate some properties of the auxiliary graph, which will be used in proving 
that our distributed summation procedure yields the corrects values. 

Lemma A.l. Consider a network and its auxiliary graph with sets {Qi}iec- The following 
statements hold: 



^•^Note that depending on construction, a network may have different auxihary graphs associated with it. 
Any of these graphs can be used in the distributed summation procedure. 
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Figure 13: A sample network with four sources and ten links. Each link shows the flows 
(or sources) using that link. This example will be used to illustrate different parts of the 
distributed stepsize computation in this section. 

(1) For each hnk /, 6^ C S{1). 

(2) Source nodes i,j are connected in the auxiliary graph if and only if there exists a link 
/, such that {i,j} C 0^. 

(3) The auxiliary graph does not contain multiple edges, i.e., there exists at most one edge 
between any pair of nodes. 

(4) The auxiliary graph is connected. 

(5) For each hnk /, 6^ ^ 0. 

(6) There is no simple cycle in the auxiliary graph other than that formed by only the 
edges with the same label. 

Proof. We prove the above statements in the order they are stated. 

(1) Part (1) follows immediately from our auxiliary graph construction, because each source 
only sends signals to links on its own route and the links only update their set 0; when 
they experience some signals passing through them. 

(2) In the auxiliary graph construction, a link is added to the auxiliary graph only in step 
(B.b.l), where part (2) clearly holds. 

(3) From the first two parts, there is an edge between source nodes i,j, i.e., {i,j} C 0; for 
some /, only if i and j share link / in the original network. From the auxiliary graph 
construction, if sources i and j share link / then an edge with label Li between i and 
j is formed at some iteration if and only if one of the following three cases holds: 

I In the beginning of the previous iteration 0/ = and sources i,j are both white. 
During the previous iteration, source i becomes grey and sends out the signal 
{label, i} to link /, hence 0^ = {i}. In the current iteration, source j with 
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(e) State of the network t = 2 
Di 



Do 




Si S, 

(b) State of the aux- 
ihary graph t = 




Si S2 

(d) State of the aux- 
ihary graph t = 1 




Si S2 

(f) State of the aux- 
ihary graph t = 2 




(g) State of the network i = 3 
Figure 14: Steps of the construction of the auxihary graph corresponding to the network in 



Si S2 

(h) State of the aux- 
ihary graph t = 3 



Figure 13 The elements of G^ are labeled on link /. A link is drawn bold in the original 
graph if 6^ 7^ 0. 
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'^nii^Lij) I®™ I — l®n > becomes grey and sends out signal {neighbor, j} to 
link /; 

II The symmetric case of I, where first source j becomes grey and one iteration later 
source i becomes grey. 

Ill In the beginning of the previous iteration 0^ = and sources i,j are both white. 
During the previous iteration, some other source t with I E L{t) becomes grey and 
sends out the signal {label,t} to link /, hence 0/ = {t}. In the current iteration, 
both source i and j with Y.m(iL{i) |0m| > I©;! > and EmeL(i) l®"^l ^ I®' I > ^ 
become grey and send out signals {neighbor, i} and {neighbor, j} to link /. 

Hence if an edge connecting nodes i and j exists in the auxiliary graph, then in the 
beginning of the iteration when the edge is formed at least one of the nodes is white, 
and by the end of the iteration both nodes are colored grey and stay grey. Therefore 
the edges between i and j in the auxiliary graph can only be formed during exactly 
one iteration. 

We next show that only one such edge can be formed in one iteration. The first two 
cases are symmetric, and without loss of generality we only consider cases I and III. 
In both of these cases, an edge between i and j is formed with label Li only if link / 
receives the signal {neighbor, j} and 0; ^ 0. In step (B.b.l) of the auxiliary graph 
construction, the first link with 0; 7^ stops the signal from passing to the rest of its 
route, hence at most one edge between i and j can be generated. Hence part (3) holds. 

(4) By using a similar analysis as above, it is straightforward to see that if at one iteration 
source i from the original network becomes grey, then in the next iteration all the 
sources which share link with i become grey and are connected to i in the auxiliary 
graph. By induction, we conclude that all the nodes in the auxiliary graph corre- 
sponding to sources colored grey in the original network are connected to the source 
node marked grey in the initialization step, and hence these nodes form a connected 
component. 

We next show that all nodes are colored grey when the auxiliary graph construction 
procedure terminates. We first argue that at least one node is marked grey from white 
at each iteration before all nodes are marked grey. Assume the contrary is true, that is 
at some iteration no more nodes are marked grey and there exists a set of white nodes 
S* . This implies that the nodes in S* do not share any links with the nodes in S\S* 
and thus there is no path from any source in the set S\S* to any source in S* using 
the links (including the feedback mechanisms) in the original network. However, this 
contradicts the fact that all links form a strongly connected graph. Therefore after 
S* — 1 iterations all nodes in the original graph are colored grey and therefore we have 
the desired statement hold. 

(5) Analysis for part (3) suggests that all the connected nodes in the auxiliary graph are 
colored grey. In view of the part (4) , all the sources are colored grey when the auxiliary 
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graph construction procedure terminates. Step (B.a.l) implies that a hnk has 0; = 
if all sources i G S{1) are white. Since each link is used by at least one source, and all 
sources are grey, part (5) holds. 

(6) We prove part (6) by showing the auxiliary graph, when the cycles formed by the 
edges of the same label are removed, is acyclic. For each link I, let i^ denote the 
first element added to the set B; in the auxiliary graph construction process, which is 
uniquely defined for each link / by Step (B.a.l). In the set S for each link /, we define 
an equivalence class by i ~ j if {i,j} C Qi\{i'i}, which implies if and only if i and j 
are connected in the auxiliary graph and i ^ j, this link is formed by scenario III as 
defined above in the proof of part (3). 

The nodes in each equivalence class are connected by edges with the same label, which 
form the undesired cycles. We remove these cycles by merging each equivalence class 
into one representative node, which inherits all the edges going between the nodes in 
the equivalence class and S\Qi in the auxiliary graph, and is connected to i^ via one 
edge. Note the resulting graph is connected, since the auxiliary graph is by part (4) 
and all the remaining edges are generated under scenarios I and II as defined in the 
proof of part (3). 

We now show that the resulting graph contains no cycle. From cases I and II, it follows 
immediately that an edge is generated when one more source becomes grey. Therefore 
if number of noes is A^, we have A^ — 1 edges. In a connected graph, this implies we 
have a tree, i.e. acyclic, and hence part (6) holds. 

n 

We denote the set of links inducing edges in the auxiliary graph as L* = {/ | |6i| > 1} 
and for each source i the set of links which induce edges in the auxiliary graph as L*{i) = 
{I \ i E Qi,l E L*} for notational convenience. Each link can identify if it is in L* by 
the cardinality of the set 6;. Each source i can obtain |i^*(i)| along the links on its route. 
The auxiliary graph remains the same throughout the distributed Newton algorithm and 
only depends on the structure of the network (independent of the utility functions and link 
capacities), therefore given a network, the above construction only needs to be preformed 
once prior to execution of the distributed Newton algorithm. 



We next present a distributed procedure to compute the sum in Eq. (81) and show 



that the sets 0; constructed using the above procedure avoids over-counting and enables 
computation of the correct values, 
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Distributed Summation Procedure 

Initialization: Each link / initializes to -2^(0) = 0. Each source i computes y* 

1 

\s{i)\ 



• 



{Axfy{Hk)ii and each link / computes Zi = r^T^(Axf^_5)^(iJfc)(i+5)(«+5)- Each source i 



^^Notc that the execution of the procedure only uses the sets O/, L* , and L*{i).We will use the structure 
of the auxiliary graph in proving the correctness of the procedure. 
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aggregates the sum 

y^io) =y*+Yl 'I (82) 

«eL(j) 
along its route. 

• Iteration for t = 1, 2, . . . , S*. The following 3 steps are completed in the order they are 
presented. 

a. Each source i sends its current value yi(t) to its route. 

b. Each link / uses the yi(t) received and computes 

ziit) = J2 y^^^ - 1) - (i®'i - 1) -^'(^ - 1)- (83) 

c. Each source i aggregates information along its route from the links / G L*[i) and 
computes 

y.{t)= J2 zi{t)-{\L*{t)\-l)y,{t-l). (84) 

ieL*(i) 

• Termination: Terminate after S number of iterations. 
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By the diagonal structure of the Hessian matrix H^, the scalars (Aa:^)^(iffc)jj and {l^x^j^gY {Hk) (i^s){i+s) 
are available to the corresponding source i and link I respectively, hence z^ and y* can be 
computed using local information. In the above process, each source only uses aggregate 
information along its route and each link / only uses information from sources i G S{1). The 
evolution of the distributed summation procedure for the sample network in Figure [13] is 



shown in Figures 15 and 16 



We next establish two lemmas, which quantifies the expansion of the t-hop neighborhood 
in the auxiliary graph for the links and sources. This will be key in showing that the 
aforementioned summation procedure yields the correct values at the sources and the links. 
For each source i, we use the notation A/i(t) to denote the set of nodes that are connected 
to node i by a path of length at most t in the auxiliary graph. Note that A/i(0) = {?}. We 
say that node i is t-hops away from node j is the length of the shortest path between nodes 
i and j is t. 

Lemma A. 2. Consider a network and its auxiliary graph with sets {Qi}i^c- Fot any link / 
and all t > 1, we have, 

M(t) n Afj{t) = Umee,^fm{t - 1) for i, j e 6; with i ^ j. (85) 



Proof. Since the source nodes i,j E Qi, by part (2) of Lemma A.l , they are 1-hop away from 
all other nodes in O^. Hence if a source node n is in A/'m(t — 1) for m E Qi, then n is at most 
t-hops away from i ot j. This yields 

u™e©, Afm{t-i)c Miit) n M,{t). (86) 

On the other hand, if n G Miit) fl Afj{t), then we have either n G ^(t — 1) and hence 

n G UrneSiMnit - 1) OT 

Let P{a, b) denote an ordered set of nodes on the path between nodes a and b including b 
but not a for notational convenience. Then the above relation implies there exists a path 
with \P{i,n)\ = t and \P{j,n)\ < t. Let n* G P{i,n) nP{j,n) and P{j,n*) nP{j,n*) = 0. 
The node n* exists, because the two paths both end at n. If n* ^ Qi, then we have a cycle of 
{P{i,n*),P{n*,j),P{j,i)}, which includes an edge with label Li between i and j and other 



edges. In view of part (6) of Lemma A.l, this leads to a contradiction. Therefore we obtain 
n* G 0/, implying P[i,n) = {P{i,n*),P{n*,n)}. Since i is connected to all nodes in 0^, 
\P{i,n*)\ = 1 and hence \P{n*,n)\ =t-l, which implies n G Mn*(t- 1) C ^meei-^mit - !)• 
Therefore the above analysis yields 

^fi{t)n^fJ{t) C Uraee.Mmit ' !)• 



With relation (86), this establishes the desired equality. D 



Lemma A. 3. Consider a network and its auxiliary graph with sets {Qi}if^c- For any source 
i, and all t > 1, we have, 

(Uje0;A/;(t)) n {Uj^e^Afj{t)) = Afi{t) for l,m e L*{i) with / y^ m. (87) 
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Figure 15: Evolution of distributed summation process, where pi = yi{0) and destination 
node is indicated using a dot with the same gqlor as its corresponding source. 
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Figure 16: Evolution of distributed summation process continued, where pi = yi{0) and 
destination node is indicated using a dot with the same color as its corresponding source. 

Proof. Since l,m E L*{i), we have i E Qi and i G 9^, this yields, 

M{t) C (U,eeX.(i)) n (U,ee„Ar,(t)) . 

On the other hand, assume there exists a node n with n G {Uj(=Q^Afj(t)) fl (Ujge™,-^(i))) 
and n ^ A/i(t). Then there exists a node p E Qi with p 7^ z and n G Mp{t). Similarly there 
exists a node q G 6^ with g 7^ z and n G Afq(t). Let P{a,b) denote an ordered set nodes 
on the path between nodes a and b including b but not a for notational convenience. Let 
n* G P{p,n) n P{q,n) and P{p,n*) fl P{q,n*) = 0. The node n* exists, because the two 
paths both end at n. Since nodes 2,p are connected via an edge with label Li and i,q are 
connected via an edge with label L^, we have a cycle of {P{i,p), P{p,n), P{n,q), P{q,i)}, 



which contradicts part (6) in Lemma A.l and we have 

The preceding two relations establish the desired equivalence. 



D 



Equipped with the preceding lemma, we can now show that upon termination of the 
summation procedure, each source i and link / have yi{S) = zi{S — 1) = (A(x^))^ [cf. Eq. 



(81) 



Theorem A. 4. Consider a network and its auxiliary graph with sets {0/}/g£. Let ^2 denote 
the set of all subsets of S and define the function a : fi — t- M as 



ff(A') 






^r$^^{«eLW} + $^?/*: 



i£K 



where y* = {Ax^f{Hk)ii, z^ = j^^{Ax'l^sy{Hk){i+s){i+s) and I{i(^L{i)} is the indicator func- 
tion for the event {/ G L{i)}. Let yi{t) and Zi{t) be the iterates generated by the distributed 
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summation procedure described above. Then for all t G {1, . . . , S*}, the value zi{t) at each 
link satisfies 

zi{t)=a{U,eeMi-^))^ (88) 

and the value yi(t) at each source node satisfies 



y,it)=aiM,it))- 

Proof. We use induction to prove the theorem. 

Base case: t = 1. 

Since zi{0) = for all links, Eq. ([83]) for t = 1 is 



iee. 



(89) 



ieBi leL(i) 



where we use the definition of y{0) [cf. Eq. (82)] and the function a{-). Since A/i(0) = i, the 
above relation implies Eq. (88) holds. 

For source i, from update relation ([84]), we have 

y,{l)= 5^ a(eO-(|L*(.)|-l)y,(0). 



Lemma A. 3 and inclusion-exclusion principle imply 



J2 ^(©0 = a(U,eL.w0O + i\L*m - 1) a(z). 



Since yj(0) = a(i) based on the definition of yi{0) [cf. Eq. (82)], by rearranging the preceding 
two relations, we obtain 

yiil) = a{Ui^L'^i)Qi) = a(M(l)), 



which shows Eq. ( 89 ) holds for t = 1 . 
Inductive step for t = T >2. 

Assume for t = T — 1, Eqs. (p9]) and ([88[) hold, we first show that Eq. ((88|) hold. When 



t = T, by update equation ( [83[ ), we obtain for link / 

z,{T)=J2miT-l)-{\Qi\-l)^iiT-l) 
ieSi 

= ^a(A/-.(T-l))-(|0,|-l)^,(T-l) 



where the second equality follows from Eq. (89) for t = T — 1. 



If \Qi\ = 1, then we have zi{T) = a{N'i{T — 1)), for i G 9^, therefore Eq. (88) is satisfied. 
For |0;| > 1, using Lemma A. 2) for t = T and by inclusion-exclusion principle, we obtain 

J2 <^i{T - 1)) = ^ (Uee,M(r - 1)) + (le^l - 1) a(U™ee,Ar„(r - 2)). 
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Eq. (88) for t = T — 1 yields zi(T — 1) = cr(Umee,A/'m(T — 2)). By using this fact and 
rearranging the preceding two relations, we have Eq. (88) holds for t = T, i.e., 



zi{T) = aiU,eeMiT-l)). 



We next establish Eq. (89). From update equation (84), using the preceding relation, we 
have 



y. 



(T)= J2 ^/(T)-(|L*(z)|-l)y.(T-l) 



ieL*(i) 



/eL*(j) 



Lemma A. 3 and inclusion-exclusion principle imply 



J2 cr (U,ee,M(T - 1)) = a(U,ez.*(0 U.ee, M(T - 1)) + (|L*(z)| - 1) a{M{T - 1)). 



By definition of A/'j(-), we have U/g2,*(j) Ujgei -^iiT — 1) = N'i{T). By using Eq. (89) for 



t = T — 1, i.e., yi{T — 1) = a{N'i{T — 1)) and rearranging the above two equations, we obtain 

y{T) = a(M(T)), 

which completes the inductive step. D 

Using definition of the function o"(-), we have (A(x'^))^ = o"(5). By the above theorem, 
we conclude that after S iterations, 

y,{S) = a{Af,{S)) = a{S) = {X{x'')f. 

By observing that Ui(zQ^Afi{S — 1) = 5, we also have 

z,iS -l) = a iU,eeMS - 1)) = ^(5)) = (A(x'=))^ 

This shows that the value X{x^y is available to all 



A.l 



where we used part (5) of Lemma 
sources and links after S — 1 iterations. 

Note that the number 5* is an upper bound on the number of iterations required in the 
distributed summation process to obtain the correct value at the links and sources in the 
original graph. If the value of the diameter of the auxiliary graph (or an upper bound on 
it) is known, then the process would terminate in number of steps equal to this value plus 
1. For instance, when all the sources share one common link, then the auxiliary graph is a 
complete graph, and only 2 iterations is required. On the other hand, when the auxiliary 
graph is a line, the summation procedure would take S iterations. 

We finally contrast our distributed summation procedure with spanning tree compu- 
tations, which were used widely in 1970s and 1980s for performing information exchange 
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among different processors in networlc ffow problems. In spanning tree based approaches, 
information from all processors is passed along the edges of a spanning tree, and stored at 
and broadcast by a designated central root node (see [23] and [10]). In contrast, our sum- 
mation procedure involves (scalar) information aggregated along the routes and fed back 
independently to different sources, which is a more natural exchange mechanism in an en- 
vironment with decentralized sources. Moreover, processors in the system (i.e., sources and 
links) do not need to maintain predecessor /successor information (as required by spanning 
tree methods). The only network-related information is the sets 9i for / G C kept at the 
individual links and obtained from the auxiliary graph, which is itself constructed using the 
feedback mechanism described above. 

B Distributed Error Checking 

In this section, we present a distributed error checking method to determine when to termi- 
nate the dual computation procedure to meet the error tolerance level in Assumption [2] at a 
primal iteration k. The method involves two stages: in the first stage, the links and sources 
execute a predetermined number of dual iterations. In the second stage, if the error tolerance 
level is not satisfied in the previous stage, the links and sources implement dual iterations 
until some distributed termination criteria is met. For the rest of this section we suppress 
the dependence of the dual vector on the primal iteration index k for notational convenience 
and we adopt the following assumption on the information available to each node and link. 

Assumption 5. There exists a positive scalar F < 1 such that the spectral radius of the 
matrix M = {D}. + Bk)~^{Bk — Bk) satisfies p{M) < F. Each source and link knows the scalar 
F and the total number of sources and links in the graph, denoted by S and L respectively. 



As noted in Section 4.2, the matrix M = {Dk + Bk)^^{Bk — Bk) is the weighted Laplacian 
matrix of the dual graph, therefore the bound F can be obtained once the structure of 
the dual graph is known [12], [7]. In this assumption, we only require availability of some 
aggregate information, and hence the distributed nature of the algorithm is preserved. Before 
we introduce the details of the algorithm, we establish a relation between \\w* — u'(t)||^ and 
||w(t + l) — u^(^)||oo5 which is a key relation in developing the distributed error checking 
method. 

Lemma B.l. Let the matrix M be M = {Dk + Bk)~^{Bk — Bk). Let w{t) denote the dual 



variable generated by iteration (18), and w* be the fixed point of the iteration. Let F and 



L be the positive scalar defined in Assumption |5| Then the following relation holds 



w*-w{t)\\^<^\\wit+l)-wm^. (90) 



Proof. Iteration ( 18 ) implies that the fixed point w* satisfies the following relation, 

w* = Mw* + {Dk + Bkr'i-AH.'Vfix")), 
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and the iterates w(t) satisfy, 

w{t + 1) = Mw{t) + {Du + Bk)-\-AH^'Vf{x'')). 
By combining the above two relations, we obtain, 

w{t + 1) - w{t) = (/ - M){w* - w{t)). 
Hence, by the definition of matrix infinity norm, we have 

w{t)\\^<\\{l-M)-^\\\w{t + l)-w{t)\\^ 



\w 



Using norm equivalence for finite dimensional Euclidean space and theories of linear algebra, 
we obtain \\{I - M)-^\\^ < VL\\{I - M)-^\\^ < iqj^ [E], [2]. For the symmetric real 
matrix M we have p{M) = ||M||2, and hence we obtain the desired relation. D 

We next use the above lemma to develop two theorems each of which serves as a starting 
point for one of the stages of the distributed error checking method. 

Theorem B.2. Let {x'^} be the primal sequence generated by the inexact Newton method 
(29) and Hk be the corresponding Hessian matrix at the k^'^ iteration. Let w{t) be the 



inexact dual variable obtained after t dual iterations (18) and w* be the exact solution to 
(11), i.e. the limit of the sequence {w{t)} as t — )■ oo. Let vectors Ax'' and Ax'' be the exact 



and inexact Newton directions obtained using w* and w{t) [cf. Eqs. (27)-(28)], and vector 
7^^ be the error in the Newton direction computation at x^ , defined by 7^= Ax'' — Ax''. 
For some positive scalar p, let 



LiH^%W)\\\w{t + l)-wit)\l 



for each source i, and 



Pi 



[1 - F)iH-%[R'Ywit) 



y^j:^eSii)iHk%\m\ \\w{t + l)-w{t)\l 



(91) 



'l-^)Eie5(o(^fc^' 



i[R'Yw{t) 



(92) 



for each link /. Define a nonnegative scalar P'' as 

(3'' 
Then we have 



Pi Pi 

max < max — , max — 

\^ [ i£S p lec p 



P\^'')'Hr.^''<p\Ai'')'Hu{A5:''). 



(93) 
(94) 
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Proof. For notational convenience, we let matrix P^. denote the S x S principal submatrix 
of Hk, i.e. {Pk)u = {Hk)ii for i < S, vector As in R^ denote the first S components of the 
vector Ax'^, vector Aiji in M.^ denote the last L components of the vector Ax^. Similarly, we 
denote by As and Ay the first 5* and last L components of the exact Newton direction Ax 



respectively. From Eq. (27), we have for each i E S, 

As,; - A5,; iH^'),^[R'fiw* - wit)) 



As,; 



< 



< 



{H,%[R'Yw{t) 
iH,%[R'Ye\\w*-wit)\l 



{H^%[R'Yw{t) 
VL{H,%\L{z)\\\w{t + l)-w{t)\l 



pi 



{l-F)iH^%[Rfw{t) 
where the first inequality follows from the element-wise nonnegativity of matrices H^ and 



R, and the second inequality follows from relation (|90|). 
for each li 



Similarly for each link / G L, by relations (27) and (28) we obtain 

[Ryp,;^R'{w* - w{t)) 



^m 



< 



< 



[pyp^'Rwit) 

j:.esii)iPk%R'e\\w*-w{t)\l 



^j:^^sii)iHj:%\m\ \\w{t+l)-wit)\l 



Ph 



il-F)E.eSii)iHk%[R']Mt) 
where the first inequality follows from the structure of the matrix R and the element-wise 



nonnegativity of matrices H^ and R, and the second inequality follows from relation (90) 
and the definition for matrix P^. 

The definition for f3^ [cf. Eq. (93)] implies that 



P 



^ 



max < max Oj, max oi 

' i&S i&c 



ASl 



Therefore the preceding relations imply that 

V^|7f| <p\Ax^ 
which implies the desired relation. 



< 



//jfe 



and 






< 



I.e., 



D 



Theorem B.3. Let {x'^} be the primal sequence generated by the inexact Newton method 
(29) and H^ be the corresponding Hessian matrix at k*^ iteration. Let w{t) be the inexact 
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dual variable obtained after t dual iterations (18) and w* be the exact solution to (11), i.e. 
the limit of the sequence {w{t)} as t — )■ oo. Let vectors Ax^ and Ax'' be the exact and inexact 



Newton directions obtained using w* and w(t) [cf. Eqs. (27)-(28)] respectively, and vector 7*^ 
be the error in the Newton direction computation at x'' , defined by 7*^ = Aa;^ — Ax'^. For 
some scalar f3 and e where < (3'' < 1 and e > 0, let 



hi 



1-F 



;i-/3'=)(L + 5)L 1^(^)1(^-1. 



(95) 



for each source i, and 



1-F 



for each link /. Define a nonnegative scalar h as 



(s+i){s+i) Y.ieS{L) \Hi)\iHk)ii 



(96) 



Then the condition 
implies 



h = I min < min hi , min hi 

\ [ JG-S leC 



\w{t + l)-wmoo<h 



iryHki' < 



(97) 



(9^ 



(99) 



1-13''' 
Proof. We let matrix P^ denote the S x S principal submatrix of Hf^, i.e. {Pk)ii = {Hk)ii for 



i < S, for notational convenience. The definition of h [cf. Eq. (97)] and relation (98) implies 

\\w{t + l)-w{t)\\^<h„eS, 

and 

\\wit + 1) - wit)\\^ <hu fori e£. 



Using relation (90) and the definition of hi and hi [cf. Eqs. (95) and (96)], the above two 
relations implies respectively that 



\w*-wmoo< 



;i-/3^)(L + s)|^(,)l(^^ 



1 ; 
k )ii 



for i e S, 



(100) 



and 



w-wmoo< 



(1 - /3'=)(L + S) (hI\,^i)^,_^,^ J2.^^^^) imUHk) 



fori eC. 



(101) 
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By using the element-wise nonnegativity of matrices H and A, we have for each source z, 
where the last equality follows from the fact that [R'^e = \L{i)\ for each source i. 



The above inequality and relation ( 100 ) imply 



(l-/3^)(L + 5)' 
By the definition of matrices Pk and R, we have for each link /, 



(102) 



iHi)^s+iKs+i) ( RPj:'R'iw*-wit)) y I < iHi)^s+i)is+i)[RyPk'R'e\\w* - ^(t)|L 



When combined with relation (|101|), the preceding relation yields 



(H^ 



k)iS+l){S+l) 



{RPk'R'i 



w 



w 



m 



< 



(l-/3'=)(L + 5)- 



(103) 



From Eqs. (27)-(28) and the definition of 7, we have 



7i 



Pk^R'iw* 
RP^^R'iw* 



w{t)) 
-wit)) 



which implies that 



ies 



,[R'Y{w* - w{t))y + J2 ((4)(s+0(s+0 {RPk'R'i 

lec 



w 



w 



m' 



< 



1-/3^=' 



where the inequality follows from (102), (103), which establishes the desired relation. 



n 

We develop the distributed error checking method based on the preceding two theorems: 



Stage 1: The links and sources implement T iterations of (18), where T is a prede- 



termined globally known constant. The links and sources then use Theorem B.2 with 
t = T — 1 and p as the desired relative error tolerance level defined in Assumption |2] 
to obtain a value (3''. If /3^ > 1, then the dual iteration terminates. 



• Stage 2: The links and sources use Theorem B.3 with /3'^ obtained in the first stage 
and e defined in Assumption [2] to obtain value h. Then they perform more iterations 



of the form (18) until the criterion (98) is satisfied 



25 



^^The error tolerance level will terminate after finite number of iterations for any h > 0, due to the 
convergence of the sequence w{t) established in Section 



5.1 
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Stage 1 corresponds to checking the term p'^{Ax^yHk{Ax'^), while Stage 2 corresponds 
to the term e in the error tolerance level. If the method terminates the dual iterations in 
Stage 1, then Assumption |2| is satisfied for any e > 0; otherwise, by combining relations (94) 



and (99), we have 



(/3^ + (1 - (5^)){-i^)'Hk-j^ < p\A£''yHk{Ax'') + e 



which shows that the error tolerance level in Assumption [2] is satisfied. 

To show that the above method can be implemented in a distributed way, we first rewrite 
the terms pi, pi, hi and hi and analyze the information required to compute them in a 
decentralized way. We use the definition of the weighted price of the route IIj (t) and obtain 



n.(t) = {Hi:' 

for all links I. 



EleL(i)^it) 



{H-,') 



i[E!]'w{t) and ni(0) = {H^ 



where Wi(0) 



Therefore relations (91) and (92) can be rewritten as 



m,{S))\\w{t + i)-w(t)\i 



pi 



;i-F)n,(t) 



v^Ei65(/)ni(o)|Kt + i)-Mt)||^ 



;i-^)E.e5(on.W 



Similarly, relations (95) and (96) can be transformed into 



hi 



I3^){L + S)L 



MO)iH, 



h, 



In our dual variable computation procedure, the values vrj(O), nj(0) and Ili{t) are made avail- 
able to all the links source i traverses through the feedback mechanism described in Section 

'Hk)ii for source i and {Hk)(s+i)(s+i) 



4.2 



Each source and node knows its local Hessian, i.e.. 



for link /. The value (3 is available from the previous stage. Therefore in the above four 
expressions, the only not immediately available information is ||w(t + 1) — w{t)\\^, which 
can be obtained using a maximum consensus algorithm]^ Based on these four terms, the 
values of /3^ and h can be obtained using once again maximum consensus and hence all the 
components necessary for the error checking method can be computed in a distributed way. 
We observe that in the first T iterations, i.e.. Stage 1, only two executions of maximum 
consensus algorithms is required, where one is used to compute \\w{t + 1) — w(t)||^ and the 



^^In a maximum consensus algorithm, each node starts with some state and updates its current state with 
the maximum state value in its neighborhood (including itself) . Therefore after one round of algorithm, the 
neighborhood of the node with maximal value has now the maximum value, after the diameter of the graph 
rounds of algorithm, the entire graph reaches a consensus on the maximum state value and the algorithm 
terminates. 
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other for /3'^. On the other hand, even though the computation of the value h in Stage 2 
needs only one execution of the maximum consensus algorithm, the term | |w(t + 1) — w(t) 1 1^ 
needs to be computed at each dual iteration t. Therefore the error checking in Stage 1 can 
be completed much more efficiently than in Stage 2. Hence, when we design values p and e 
in Assumption [2| we should choose p to be relatively large, which results in an error checking 
method that does not enter Stage 2 frequently, and is hence faster. 
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